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Abstract 

Standard  anomaly  detectors  and  classifiers  assume  data  to  be  uncorrelated  and 
homogeneous,  which  is  not  inherent  in  Hyperspectral  Imagery  (HSI).  To  address  the 
detection  difficulty,  a  new  method  tenned  Iterative  Linear  RX  (ILRX)  uses  a  line  of 
pixels  which  shows  an  advantage  over  RX,  in  that  it  mitigates  some  of  the  effects  of 
correlation  due  to  spatial  proximity;  while  the  iterative  adaptation  from  Iterative  Linear 
RX  (IRX)  simultaneously  eliminates  outliers. 

In  this  research,  the  application  of  classification  algorithms  using  anomaly 
detectors  to  remove  potential  anomalies  from  mean  vector  and  covariance  matrix 
estimates  and  addressing  non-homogeneity  through  cluster  analysis,  both  of  which  are 
often  ignored  when  detecting  or  classifying  anomalies,  are  shown  to  improve  algorithm 
performance. 

Global  anomaly  detectors  require  the  user  to  provide  various  parameters  to 
analyze  an  image.  These  user-defined  settings  can  be  thought  of  as  control  variables  and 
certain  properties  of  the  imagery  can  be  employed  as  noise  variables.  The  presence  of 
these  separate  factors  suggests  the  use  of  Robust  Parameter  Design  (RPD)  to  locate 
optimal  settings  for  an  algorithm.  This  research  extends  the  standard  RPD  model  to 
include  three  factor  interactions.  These  new  models  are  then  applied  to  the  Autonomous 
Global  Anomaly  Detector  (AutoGAD)  to  demonstrate  improved  setting  combinations. 
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TOWARDS  THE  MITIGATION  OF  CORRELATION  EFFECTS 
IN  THE  ANALYSIS  OF  HYPERSPECTRAL  IMAGERY  WITH 
EXTENSIONS  TO  ROBUST  PARAMETER  DESIGN 


1  Introduction 


1.1  Background 

Hyperspectral  Imagery  (HSI)  is  a  method  used  to  collect  contiguous  data  across  a 
large  swath  of  the  electromagnetic  spectrum,  which  is  accomplished  by  using  a 
specialized  camera  mounted  on  an  aircraft  or  satellite  to  take  a  picture  of  the  required 
area,  thereby  recording  the  magnitude  of  the  bands  within  the  collected  wavelengths. 
Typically,  HSI  encompasses  the  visible  to  infrared  regions  of  the  spectrum,  containing 
anywhere  from  more  than  20  to  250  plus  spectral  bands,  whereas  standard  digital  cameras 
capture  three  bands:  red,  green,  and  blue.  The  electromagnetic  spectrum,  shown  in 
Figure  1,  is  comprised  of  various  wavelengths,  measured  in  micrometers  (pm)  or 
nanometers  (nm),  commonly  by  the  visible  region,  but  also  includes  X-rays,  ultraviolet, 
infrared,  micro-waves,  etc.  (Landgrebe,  2003). 
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Figure  1:  The  Electromagnetic  Spectrum  (Landgrebe,  2003) 
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When  dealing  with  HSI  data,  anomaly  detection  is  used  to  find  objects  of  interest 
within  the  image  by  locating  pixels  that  are  statistically  different  from  the  background. 
The  vast  amount  of  data  contained  in  HSI  affords  a  great  opportunity  to  detect  anomalies 
in  an  image  using  standard  multivariate  statistical  techniques,  as  each  material  reflects 
individual  wavelengths  of  the  spectrum  differently.  Figure  2  shows  a  spectral  space  plot 
of  water,  trees,  and  soil.  This  gives  a  good  visual  representation  of  how  various  materials 
reflect  individual  wavelengths.  The  three  plots  across  the  entire  spectrum  shown  are  very 
different.  However,  there  are  regions  where  they  overlap  and  become  indistinguishable. 
This  highlights  the  benefit  of  collecting  a  vast  amount  of  wavelengths  over  the  three  used 
for  a  standard  color  image.  However,  the  large  amount  of  data  contained  within  each 
image  often  requires  dimensionality  reduction/feature  selection  techniques  to  be 
employed  such  that  analysis  of  the  image  data  operates  on  lower  dimensional, 
uncorrelated  data  (Landgrebe,  2002). 
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When  a  hyperspectral  image  is  collected,  the  data  is  stored  in  a  three-dimensional 
matrix,  referred  to  as  an  image  cube  or  data  cube,  displayed  in  Figure  3.  The  first  two 
dimensions  of  the  data  cube  correspond  to  the  location  of  the  pixel  in  the  image,  and  the 
third  dimension  represents  the  different  spectral  bands  that  were  collected  (Landgrebe, 
2003).  Prior  to  processing  an  image,  for  anomaly  detection  or  classification,  it  is  usually 
transformed  into  a  data  matrix.  A  data  matrix  consists  of  an  n  x  p  matrix  where  n  is  the 
total  number  of  pixels  in  the  image  consisting  of  p  spectral  bands,  therefore  a  single  pixel 
is  represented  by  a  1  x  p  vector. 
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Figure  3:  Image  Cube 


Current  anomaly  detectors,  such  as  the  RX  anomaly  detector  created  by  Reed  and 
Yu  (1990),  are  likely  to  have  a  high  false  positive  detection  rates  because  they  assume  the 
data  is  modeled  with  a  Gaussian  distribution.  However,  it  is  has  been  shown  that 
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hyperspectral  data  is  not  often  unimodal  (Banerjee  et  al.,  2006).  Further,  to  compound 
the  non-Gaussian  difficulty,  the  data,  by  its  very  nature,  is  correlated  and  heterogeneous. 
There  are  four  main  correlation  problems  inherent  to  HSI  that  if  addressed  properly  could 
potentially  benefit  anomaly  detection  and  classification:  spatial  correlation  (correlation 
between  pixels  due  to  proximity),  spectral  correlation  (correlation  between  spectral 
bands),  the  presence  of  outliers  or  anomalies,  and  non-homogeneity.  Even  though  many 
of  the  current  detectors,  such  as  RX,  are  hindered  by  these  correlation  problems,  they  are 
still  used  in  practice  because  they  have  a  relatively  fast  processing  time,  are  intuitively 
easy  to  understand,  and  are  simple  to  implement. 

Most  anomaly  detectors  have  numerous  user-defined  settings  that  are  required  to 
implement  the  algorithm.  Using  improper  settings  can  have  a  negative  effect  on  the 
overall  performance  of  the  algorithm.  Additionally,  a  particular  set  of  images  being 
analyzed  could  benefit  from  one  setting  combination,  whereas  another  set  could  be 
hindered  by  said  combination.  Therefore,  finding  a  setting  combination  that  is  robust  to  a 
vast  collection  of  images  is  pertinent.  This  leads  to  the  idea  of  implementing  Robust 
Parameter  Design  (RPD)  to  find  the  setting  combinations  which  are  successful  across  a 
wide  range  of  images  with  little  variability.  To  do  this,  the  image  characteristics  need  to 
be  treated  as  noise  variable  and  the  settings  are  treated  as  control  variables. 

1.2  Original  Contributions  and  Research  Overview 

The  first  goal  of  this  research  is  to  address  correlation  problems  inherent  to  HSI 
that  are  often  ignored  by  the  research  community  when  perfonning  anomaly  detection  or 
classification.  The  four  main  correlation  problems  are:  spatial  correlation  (correlation 
between  pixels  due  to  proximity),  spectral  correlation  (correlation  between  spectral 
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bands),  the  presence  of  outliers  or  anomalies,  and  non-homogeneity.  The  second  goal  of 
the  research  is  to  extend  the  standard  Noise  by  Noise  (N  x  N)  RPD  model  recently 
introduced  by  Mindrup  et  al.  (2012)  to  include  control  by  noise  by  noise  (C  x  N  x  N)  and 
noise  by  control  by  control  (N  x  C  x  C)  interactions. 

Chapter  2  will  introduce  two  new  anomaly  detectors:  Linear  RX  (LRX),  a  variant 
of  Reed  and  Yu  (1990)  RX  detector,  and  Iterative  Linear  RX  (ILRX),  a  variant  of  the 
Taitano  et  al.  (2010)  Iterative  RX  (IRX)  detector.  LRX  addresses  spatial  correlation 
related  to  RX  by  establishing  a  mean  vector  and  covariance  matrix  using  data  that  is,  on 
average,  farther  from  each  other  than  the  standard  RX  window.  The  IRX  detector  allows 
for  the  exclusion  of  outliers  in  the  mean  vector  and  covariance  matrix  calculations, 
thereby  promoting  a  more  accurate  assessment  of  the  target  pixel.  ILRX  then  exploits  the 
innovations  of  both  LRX  and  IRX. 

Chapter  3  continues  addressing  correlation  in  HSI,  but  this  time  with  the  goal  of 
classification.  The  Adaptive  Matched  Filter  (AMF)  with  Manolakis  et  al.  (2009) 
suggested  improvements,  to  be  called  the  Robust  AMF,  is  competed  against  the  Standard 
AMF.  The  improvements  suggested  by  Manolakis  et  al.  (2009)  are  to  remove  the 
anomalies  from  the  image  prior  to  calculating  the  required  mean  vector  and  covariance 
matrix.  Additionally,  two  more  AMFs  will  be  tested  against  the  Standard  AMF  and 
Robust  AMF.  Clustered  AMF,  which  clusters  the  image  after  removal  of  the  anomalies 
and  classifies  the  pixel  of  interest  using  the  mean  vector  and  covariance  matrix  of  the 
cluster  in  which  the  pixels  is  located;  and  Largest  Cluster  AMF,  which  similarly  clusters 
the  image  after  removal  of  the  anomalies,  however,  it  classifies  the  pixel  of  interest  using 
the  mean  vector  and  covariance  matrix  of  the  largest  cluster  in  the  image.  Robust  AMF 
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addresses  the  problem  of  anomalous  pixels  skewing  the  required  statistics.  Clustered 
AMF  and  Largest  Cluster  AMF  exploit  the  idea  of  Robust  AMF  and  address  the  concern 
of  non-homogeneity. 

Chapter  4  provides  the  required  statistical  models  to  extend  the  Mindrup  et  al. 
(2012)  N  x  N  RPD  model  to  include  higher  order  terms,  including  the  C  x  N  x  N  and  N  x 
C  x  C  interaction  terms.  These  higher  order  models  will  then  be  applied  to  the 
Autonomous  Global  Anomaly  Detector  (AutoGAD),  a  HSI  anomaly  detector,  to  locate 
better  operating  parameter  settings,  using  properties  of  the  hyperspectral  images  as 
system  noise  (Johnson  et  al.,  2012).  The  benefit  of  the  models  will  be  demonstrated 
through  increased  R  adj  and  decreased  Mean  Squared  Error  (MSE),  and  new  AutoGAD 
settings  which  provide  higher  mean  responses  and  lower  response  variance. 


6 


2  Towards  the  Mitigation  of  Correlation  Effects  in  Anomaly  Detection  for 


Hyperspectral  Imagery 


2.1  Introduction 

Remote  sensing  involves  studying  a  given  object  without  initiating  physical 
contact  (Eismann,  2012;  Schott,  1997);  of  particular  interest  are  passive  remote  sensing 
systems  which  rely  on  natural  sources  of  illumination.  Hyperspectral  Imagery  (HSI) 
systems  are  passive  systems  which  collect  spectrally  contiguous  data  across  a  large  swath 
of  the  electromagnetic  spectrum,  permitting  material  identification  through  fine  spectral 
sampling.  One  of  the  fundamental  problems  faced  by  practitioners  in  this  area  is 
analyzing  the  highly  correlated  data  streams  that  are  output  from  these  models  (Banks  et 
al,  2009).  Computer  models,  such  as  discrete-event  simulations,  are  used  to  aid  in 
understanding  real-world  processes.  Simulation  analysts  must  deal  with  temporal 
correlation.  In  this  research,  we  are  concerned  with  highly  correlated  data  of  both  a 
spatial  and  spectral  nature.  Specifically,  we  will  address  the  spatial  correlation  problem. 

Typically,  HSI  encompasses  the  visible  to  infrared  regions  of  the  spectrum, 
containing  anywhere  from  more  than  20  to  250  plus  spectral  bands,  whereas  standard 
digital  cameras  capture  three  coarsely  sampled  bands:  red,  green,  and  blue.  The  vast 
amount  of  data  contained  in  HSI  affords  a  great  opportunity  to  detect  anomalies  in  an 
image  using  standard  multivariate  statistical  techniques,  as  each  material  reflects 
individual  wavelengths  of  the  spectrum  differently.  However,  the  large  amount  of  data 
contained  within  each  image  often  requires  dimensionality  reduction/feature  selection 
techniques  to  be  employed  such  that  analysis  algorithms  operate  on  lower  dimensional, 
uncorrelated  data  as  described  by  Landgrebe  (2002). 
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Anomaly  detection  refers  to  the  location  of  spectral  data  that  does  not  belong 
within  a  given  set.  It  can  be  used  in  numerous  applications  such  as  financial  fraud 
detection,  computer  security,  and  military  surveillance  (Chandola  et  ah,  2009).  In  HSI 
applications,  anomaly  detection  is  used  to  find  objects  of  interest  within  the  image  by 
locating  pixels  statistically  different  from  the  non-anomaly  pixels,  referred  to  as  the 
background.  Three  broad  categories  of  anomaly  detection  methods  exist  (Chandola  et  ah, 

2009) :  supervised,  semi-supervised,  and  unsupervised  detection.  Supervised  detection 
requires  a  set  of  training  data  that  includes  both  the  background  and  anomaly  data  prior  to 
analysis.  Semi-supervised  detection  also  requires  a  training  set;  however,  it  only  requires 
background  data.  Differences  between  images,  e.g.,  the  desert  and  forest  images  in  this 
research  present  a  problem  that  effects  supervised  or  semi-supervised  methods  when 
applied  to  HSI.  Therefore,  it  is  difficult  to  train  a  detector  on  one  image  and  test  it 
against  another.  The  standard  work  around  for  semi-supervised  detection  is  to  select  a 
random  set  of  data.  This  practice  is  successful  because  the  set  of  anomalies  in  the  data 
set  is  assumed  to  be  sparse;  hence,  the  random  selection  should  provide  a  representative 
sample  of  the  true  background.  Unsupervised  detection  does  not  require  a  training  set, 
and  is  therefore  more  appropriate  when  analyzing  HSI  data. 

The  literature  on  anomaly  detection  in  HSI  has  increased  following  the 
publication  of  Reed  and  Yu’s  paper  on  the  RX  detector  in  1990  (Reed  and  Yu,  1990),  to 
include  various  articles  with  modifications  or  additions  to  the  RX  detector  (Eismann, 
2012;  Hsueh  and  Chang,  2004;  Yanfeng  et  ah,  2006;  Liu  and  Chang,  2008;  Taitano  et  ah, 

2010) ,  classification  and  discrimination  methods  (Eismann,  2012;  Chang  and  Ren,  2000; 
Chang  and  Chiang,  2002),  different  fusion  techniques  (Acito  et  al.,  2006;  Nasrabadi, 


2008),  and  overview  articles  (Manolakis  and  Shaw,  2002;  Stein  et  al.,  2002;  Smetek  and 
Bauer,  2008)  of  detection  algorithms,  including  RX.  Related  work  includes  a  number  of 
additional  detectors,  such  as:  Support  Vector  Data  Description  (SVDD)  (Tax  and  Duin, 
1999;  2004;  Banerjee  et  ah,  2006;  2007),  multiple  window  detectors  (Yanfeng  et  ah, 
2006;  Kwon  et  ah,  2003;  Liu  and  Chang,  2004),  and  various  mixture  models  (Eismann, 
2012;  Smetek  and  Bauer,  2008;  Grossman  et  ah,  1998;  Clare  et  al.  2003).  More  recently, 
work  has  been  conducted  using  synthetically  generated  or  simulated  data  to  supplement 
the  low  number  of  hyperspectral  images  with  available  truth  masks  that  are  typically 
accessible  to  researchers  (Huesh  and  Chang,  2004;  Shi  and  Healey,  2005;  Gaucel  et  al., 
2005;  Bellucci  et  al,  2010). 

In  practice,  the  RX  method,  when  applied  to  hyperspectral  data,  is  likely  to  have  a 
high  false  positive  detection  percentage  because  the  underlying  statistics  assumes  the  data 
being  analyzed  follows  a  Gaussian  distribution.  However,  Banerjee  et  al.  (2006)  showed 
that  HSI  is  not  often  unimodal.  Further,  to  compound  the  non-Gaussian  difficulty,  an 
image,  by  its  very  nature,  is  correlated  and  heterogeneous.  However,  RX  is  still  used  in 
practice  because  it  offers  fast  processing  times,  is  intuitively  easy  to  understand,  and  is 
algorithmically  simple. 

The  purpose  of  this  chapter  is  to  present  modifications  to  the  standard  RX 
algorithm.  A  new  method,  called  Linear  RX  (LRX),  has  the  ability  to  overcome  some  of 
the  correlation  problems  hindering  RX  (Reed  and  Yu,  1990)  and  Iterative  RX  (IRX) 
(Taitano  et  al.  2010).  This  research  contrasts  the  performance  of  LRX  and,  another  new 
method,  its  variant  Iterative  Linear  RX  (ILRX),  to  RX  and  IRX.  Additionally,  to  further 
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test  the  benefit  of  the  new  algorithms,  both  algorithms  are  tested  against  the  global 
SVDD  algorithm,  a  promising  new  supervised  HSI  detector  (Banerjee  et  al.,  2006;  2007). 

The  remainder  of  this  chapter  is  organized  as  follows:  Section  2  presents  a 
description  of  the  algorithms  contrasted  in  this  research,  Section  3  details  the 
methodology  used  to  compete  the  five  anomaly  detectors,  Section  4  provides  the 
experimental  results,  and  in  Section  5,  the  chapter  is  concluded. 

2.2  Algorithms 

This  section  of  the  chapter  describes  how  each  of  the  five  anomaly  detection 
algorithms  are  contrasted  and  implemented,  and  explains  the  use  of  the  Nonnalized 
Difference  Vegetation  Index  (NDVI)  in  post-processing  to  realize  improved  results  from 
the  detectors.  Due  to  the  large  amount  of  data  contained  within  a  given  hyperspectral 
image,  it  is  standard  practice,  prior  to  applying  an  anomaly  detector,  to  reduce  the 
dimensionality  of  the  image  by  running  Principal  Component  Analysis  (PCA)  (Farrell 
and  Mersereau;  2005)  on  the  whole  data  set,  retaining  the  P  largest  Principal  Components 
(PCs). 

2.2.1  The  RX  Detector  (RX) 

The  RX  detector,  introduced  by  Reed  and  Yu  (1990),  detects  anomalies  utilizing  a 
moving  window  approach,  where  the  pixel  in  the  center  is  scored  by  comparison  to  the 
remaining  pixels  in  the  window.  The  window,  usually  square  in  shape,  is  shifted,  one 
pixel  at  a  time,  across  a  row  of  pixels  with  the  new  center  pixel  being  scored  at  each  step, 
as  displayed  in  Figure  4a,  where  the  red  square  represents  the  test  pixel  and  the  box 
around  the  test  pixel  represents  the  pixels  compared  with  the  test  pixel  to  generate  an  RX 
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score.  This  process  is  continued  until  all  possible  pixels  have  been  analyzed.  Each  test 
pixel,  x,  is  given  a  score  based  upon  a  generalized  likelihood  ratio  test  which  simplifies  to 
equation  (1)  if  the  pixels  within  the  test  window  are  assumed  to  be  normally  distributed 
with  mean  vector  of  the  background  pixels,  p,  and  covariance  matrix  X.  It  should  also  be 
noted  that  as  the  number  of  pixels  in  the  window,  N,  approaches  to  infinity,  the  RX  score 
becomes  the  squared  Mahalanobis  distance  between  the  test  pixel  and  the  mean  vector  of 
the  background  pixels, 


RX(x)  =  (x-  ju)1 


r  N  V  |  (  1  y.  ..,r 
U+n 


-i-i 


N  +  l 


(• x-ju)(x-juY 


(x-jU)- 


(1) 


Pixels  with  an  RX  score  greater  than  y~(U  (n-  i).  where  a  represents  the  corresponding 
significance  level  of  the  chi-squared  distribution,  are  labeled  anomalous  by  the  RX 
detector. 


2.2.2  The  Iterative  RX  Detector  (IRX) 

The  IRX  detector  (Taitano  et  al.,  2010)  is  an  extension  of  the  standard  RX 
detector;  IRX  extends  the  RX  detector  through  an  iterative  process,  where  each  iteration 
sees  IRX  calculating  an  improved  estimate  of  the  mean  vector  and  covariance  matrix  of 
the  background  pixels. 

The  IRX  algorithm  is  processed  using  the  following  steps: 

1 .  Each  iteration  begins  by  running  the  standard  RX  algorithm  to  calculate  an  RX 
score,  i.e.  RX(x),  for  each  testable  pixel  in  the  image;  however,  to  improve 
accuracy,  pixels  selected  as  anomalies  in  the  previous  iteration  are  excluded  from 
the  data  used  to  estimate  the  mean  vector  and  covariance  matrix  of  the 
background.  Note:  At  the  start  of  the  algorithm  the  set  of  anomalies  is  empty. 
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2.  Using  the  RX  scores  calculated  in  step  1,  a  pixel,  jc,  is  considered  anomalous  if  its 
RX  score  is  greater  than  x  a,  (N- 1).  This  ends  a  given  iteration,  allowing  for  pixels 
to  enter  and  exit  the  set  of  anomalies. 

3.  The  algorithm  ends  if  the  set  of  anomalies  determined  in  step  2  is  identical  to  the 
set  of  anomalies  from  the  previous  iteration  or  the  maximum  number  of  iterations 
has  been  reached.  Otherwise,  the  algorithm  iterates  again  from  step  1 . 

2.2.3  The  Linear  RX  (LRX)  and  Iterative  Linear  RX  (ILRX)  Detectors 

LRX  and  ILRX  are  similar  to  RX  and  IRX,  respectively,  however,  instead  of  a 
window  being  moved  through  the  image,  they  employ  a  vertical  line  of  pixels  above  and 
below  the  test  pixel.  If  the  number  of  pixels  above  or  below  the  test  pixel  exceeds  the 
height  of  the  image,  the  required  pixels  are  taken  from  the  bottom  of  the  previous  column 
or  from  the  top  of  the  following  column,  Figure  4b.  The  line  is  used  to  increase  the 
average  distance  between  the  pixels  used  to  estimate  the  mean  vector  and  covariance 
matrix.  This  can  be  seen  in  Figure  5,  which  shows  the  average  distance  between  pixels 
using  a  window  and  a  line.  Increasing  the  average  distance  between  pixels  mitigates  the 
deleterious  effects  of  correlation  due  to  the  spatial  proximity  of  the  pixels.  Such  a  step 
allows  for  the  reduction  of  bias  and  error  in  the  estimation  of  the  mean  vector  and 
covariance  matrix.  A  possible  concern  for  such  an  approach  might  be  that  the  reduction 
in  the  contribution  to  the  bias  due  to  spatial  correlation  may  be  offset  by  the  contribution 
to  the  bias  due  to  image  non-stationarity.  This  issue  is  discussed  later  in  the  chapter  and 
as  demonstrated  below,  is  not  a  concern  for  images  we  tested. 
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Figure  4:  RX  Window  vs.  LRX  Line 


Figure  5:  Average  Distance  Between  Pixels 


2.2.4  Support  Vector  Data  Description  (SVDD) 

Banerjee  et  al.  (2006;  2007)  extended  the  SVDD  algorithm  by  Tax  and  Duin 
(1999;  2004)  into  an  HSI  anomaly  detector.  SVDD  is  a  one-class  classifier,  where  points 
are  considered  in  class  or  out  of  class,  where  the  support  of  the  distribution  is  considered 
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as  the  minimally  enclosing  hypersphere  in  the  feature  space.  In  operation,  SVDD  takes  a 
training  set,  T  =  {jc(.,  i  =  1,  ...  ,M) ,  of  M background  pixels,  x,  is  randomly  selected 
from  the  image  as  the  training  data.  SVDD  then  attempts  to  determine  the  minimum 
volume  hypersphere,  S  ( R,a )  =  jx :  ||x  -  a||  <  f?| ,  as  the  L2  nonn  or  Euclidean  norm,  with 

radius  R  >  0  and  center  a  that  contains  the  set  of  M  randomly  chosen  pixels.  This  is 
obtained  by  solving  the  following  minimization  problem, 

min(7?)  subject  to  xi  eS,  i  =  1  (2) 

The  radius  R  and  center  a  of  the  hypersphere  are  determined  by  minimizing  the 
Lagrangian,  L,  with  respect  to  the  weights,  or  support  vectors,  «„ 

M 

Z(f?,a,«,.)  =  f?2  -((x;.,x;)-2(a,x.)  +  (a,a))},  (3) 

i=l 

where  (•,•)  represents  the  dot  product  of  the  operation  of  the  two  vectors. 

After  optimizing,  the  kernel  technique,  which  transforms  data  to  a  different 
dimensional  space  for  simpler  computations  without  ever  explicitly  calculating  the 
mapping,  can  be  applied  which  leads  to  the  SVDD  statistic, 

M 

SVDD(y)  =  R2 -K(y,y)  +  2YjaiK(y,xi),  (4) 

1=1 

where  K{x,y)  is  the  kernel  mapping  defined  by 

K(x,  y)  =  exp  (-  ||x  -  yf  /  <r2 ) ,  (5) 

and  variable  o“  is  a  radial  basis  function  parameter  used  as  a  scaling  factor  to  determine 
the  size  of  the  hypersphere,  hence  adjusting  how  well  the  SVDD  algorithm  generalizes 
the  incoming  data. 
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When  applied  to  HSI,  the  SVDD  algorithm  is  processed  in  the  following  steps 
(Banerjee  et  ah,  2006): 

1 .  Randomly  select  M  pixels  from  the  image. 

2.  Estimate  an  optimal  value  for  a  by  detennining  a  value  that  will  minimize  the 
false  positive  rate  or  the  number  of  background  pixels  classified  as  targets. 

3.  Estimate  the  parameters  ( R ,  a,  a;)  needed  to  model  the  hypersphere. 

4.  Determine  SVDD(y)  for  each  test  pixel.  If  SVDD(y)  >  t,  for  a  user  defined 
threshold  t,  the  pixel  is  labeled  an  anomaly,  otherwise  if  SVDD(y)  <  t  the  pixel  is 
labeled  as  background. 

The  SVDD  algorithm  is  considered  in  this  research  because  it  is  a  novel  and 
promising  state  of  the  art  detector  and  as  a  semi-supervised  method,  it  allows  for  an 
interesting  performance  contrast  relative  to  the  other  unsupervised  methods. 

2.2.5  Normalized  Difference  Vegetation  Index  (NDVI) 

It  is  fairly  common  to  get  false  positives,  i.e.  pixels  labeled  as  anomalies  that  are 
truly  background  pixels,  when  attempting  to  find  anomalies,  generally  man-made  objects, 
in  HSI  using  one  of  the  previously  described  methods.  One  relatively  simple  way  to 
reduce  false  positives  is  to  implement  some  form  of  pre-  or  post-processing.  Since  we 
are  attempting  to  locate  anomalies  without  prior  knowledge,  one  applicable  post¬ 
processing  method  is  applying  the  Nonnalized  Difference  Vegetation  Index  (NDVI),  as 
introduced  by  Rouse  et  al.  (1973),  to  remove  pixels  that  are  likely  to  be  vegetation. 

NDVI  gauges  whether  or  not  a  given  pixel  is  green  vegetation  by  using  the 
absorptive  cutoff  of  chlorophyll  between  the  visible  and  near  infrared  spectrum.  It  does 
this  by  comparing  the  intensity  of  the  visible  bands  to  the  intensity  of  the  near  infrared 
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bands,  since  the  reflectance  in  the  near  infrared  bands  is  considerably  larger  for 
vegetation.  The  measure  is  given  by 


NDVI  = 


NIR-Red 
NIR  +  Red  ’ 


(6) 


where  NIR  denotes  the  radiance  value  of  the  near  infrared  spectral  band  and  Red  denotes 
the  radiance  value  of  the  red  spectral  band  (Eismann,  2012;  Schott,  1997;  Rouse  et  ah, 
1973;  Landgrebe,  2003).  Prior  to  locating  the  anomalies,  an  NDVI  threshold  for  the 
image  and  an  NDVI  score  for  each  pixel  is  calculated.  The  NDVI  threshold  was 
determined  by  plotting  the  NDVI  scores  for  all  of  the  images  and  setting  a  threshold. 
Subsequently,  all  pixels  with  an  NDVI  score  above  the  threshold  are  classified  as 
vegetation.  Once  the  anomaly  detector  has  been  run,  regardless  of  the  indications,  all 
declared  vegetation  pixels  are  classified  as  background.  Since  the  desert  images  display 
NDVI  scores  that  are,  for  the  most  part,  below  the  selected  threshold,  very  few  pixels  will 
be  classified  as  vegetation;  hence,  very  few  potential  false  positives  are  deleted. 


2.3  Methodology 

The  five  anomaly  detectors  were  compared  using  six  images  from  the  Forest 
Radiance  I  and  Desert  Radiance  II  collection  events,  from  the  Hyperspectral  Digital 
Imagery  Collection  Equipment  (HYDICE)  push-broom,  aircraft  mounted  sensor  (Rickard 
et  ah,  1993).  The  HYDICE  sensor  collects  spectral  data  in  210  bands  between  397  mn 
and  2,500  mn,  including  visible,  and  infrared  data.  Due  to  atmospheric  absorption 
effects,  only  145  bands  were  used  in  the  analysis  of  the  images.  A  description  of  each 
image  is  shown  in  Table  1  and  the  images  are  displayed  in  Figure  6. 
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Table!:  ARES  Image  Data 


Image 

Size 

Total  Pixels 

Anomalies 

Anomalous  Pixels 

ARES  ID 

291  x 199 

57,909 

6 

235 

ARES  IF 

191  x  160 

30,560 

10 

1,007 

ARES  2D 

215  x 104 

22,360 

46 

523 

ARES  2F 

312  x  152 

47,424 

30 

307 

ARES  3F 

226  x  136 

30,736 

20 

145 

ARES  4F 

205  x  80 

16,400 

29 

109 

ARES  ID 


ARES  IF 


Three  large  anomalies  circled  in  red. 


ARES3F 


Figure  6:  ARES  Images 


ARES  2D 


ARES4F 
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Due  to  the  small  number  of  images  available  and  the  need  to  train  and  validate 
each  of  the  algorithms,  all  six  images,  presented  in  Table  1  and  Figure  6,  were  divided  in 
half  to  create  a  top  and  bottom  portion  of  the  image.  Then,  a  top  or  bottom  of  each 
image,  chosen  randomly,  was  selected  for  the  training  set  of  images  and  the  other  half 
was  used  in  the  validation  set.  The  training  set  included  the  top  of  ARES  2D  and  ARES 
4F  and  the  bottom  of  ARES  ID,  ARES  IF,  ARES  2F,  and  ARES  3F.  It  may  be  a  stretch 
to  try  and  draw  too  much  from  a  comparison  of  five  algorithms  and  only  six  images; 
however,  we  believe  our  experiments  point  to  the  clear  potential  of  the  new  technique. 
Furthermore,  while  splitting  the  images  in  half  to  double  our  data  may  not  be  the  best 
method  for  creating  training  and  test  sets,  it  is  certainly  better  than  using  the  same  images 
for  training  and  test.  We  acknowledge  that  the  image  halves  are  spectrally  correlated  due 
to  shared  weather,  viewing  conditions,  etc.  However,  correlation  in  the  spatial  domain 
appears  to  be  minimal. 

Each  of  the  algorithms  was  tested  across  a  large  combination  of  parameter 
settings  in  order  to  find  the  optimal  settings  for  the  algorithm.  The  parameters  were:  the 
number  of  PCs  to  retain,  the  number  of  pixels  to  use  in  the  window/line  using  RX 
methods  or  the  size  of  the  training  set  for  SVDD,  the  number  of  iterations  to  use  for  the 
iterative  methods,  whether  NDVI  was  used  in  post  processing,  and  the  parameter  value,  a 
for  SVDD.  This  is  summarized  in  Table  2.  The  last  column  is  displayed  to  show  how 
many  combinations  of  parameters  were  collected  for  each  method  on  a  single  image. 
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Table  2:  Algorithm  Parameter  Settings 


Algorithm 

Number  of  PCs 

Number  of  Pixels* 

Number  of 

Iterations 

o2 

NDVI 

Total  Data 

Points  Collected 
(Per  image) 

RX 

3,  4, 

..,  10 

172,  192 .  252 

1 

- 

Yes/No 

80 

IRX 

3,  4, 

..,  10 

172,  192 .  252 

10,  20,.. 

,  50 

- 

Yes/No 

400 

LRX 

3,  4, 

..,  10 

0.5*H,  1*H,  1. 5*1-1,  2*H 

1 

- 

Yes/No 

64 

ILRX 

3,  4, 

..,  10 

0.5*H,  1*H,  1. 5*1-1,  2*H 

10,  20,.. 

,  50 

- 

Yes/No 

320 

SVDD 

3,  4, 

..,  10 

172,  192 .  252 

1 

10,  20,... 

300 

Yes/No 

2,400 

The  algorithms’  anomaly  detection  performance  on  the  selected  test  set  was 
compared  through  the  use  of  Receiver  Operating  Characteristic  (ROC)  curves  (Fawcett, 
2006).  Since  the  RX  algorithms’  test  statistics  are  based  upon  the  chi-squared 
distribution,  the  significance  level  a  was  varied  to  serve  as  the  threshold  for  the  ROC 
curves.  Similarly,  the  user-defined  anomaly  threshold  t  was  varied  in  the  SVDD 
algorithm  to  generate  ROC  curves.  Due  to  the  fact  that  a  large  number  of  settings  for 
each  algorithm  were  examined,  visual  inspection  of  the  ROC  curves  was  not  feasible. 
Therefore,  the  individually-tested  setting  combinations  for  each  algorithm  were  scored 
using  the  Neyman-Pearson  technique  (Kay,  1993).  Specifically,  the  True  Positive 
Fraction  (TPF)  for  the  anomalous  pixels  detected  in  each  of  the  six  images  was  averaged 
when  the  corresponding  False  Positive  Fraction  (FPF)  is  equal  to  0. 1 .  A  FPF  of  0. 1  was 
chosen  because  it  was  deemed  that  if  the  FPF  exceeded  0.1,  the  algorithm  would  no 
longer  be  of  any  practical  use  due  to  over-saturation  of  misclassified  data. 

After  the  setting  combination  with  the  highest  average  TPF  at  a  FPF  =  0.1  was 
determined  for  each  anomaly  detector,  its  performance  was  validated  by  taking  the  best 
settings  for  each  individual  algorithm,  and  running  them  on  the  six  validation  images. 

An  artifact  of  the  RX  and  IRX  methods,  as  described  by  Reed  and  Yu  (1990)  and 
Taitano  et  al.  (2010),  is  an  area  of  pixels  that  form  a  border  around  the  image  which 
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cannot  be  tested  due  to  the  requirement  of  the  window.  Methods  to  allow  the  algorithms 
to  test  the  border  pixels  can  be  implemented,  such  as  using  only  the  part  of  the  window 
that  is  within  the  image  or  moving  the  test  pixel  from  the  center  of  the  window  when  it  is 
against  the  border  of  the  image.  However,  in  this  research,  the  RX  and  IRX  algorithms 
as  originally  designed  were  competed  and  the  border  pixels  that  could  not  be  tested  were 
not  considered  in  the  performance  evaluation. 

2.4  Results 

Relative  to  the  training  data,  the  results  for  the  best  settings  of  each  algorithm  by 
image  and  overall  average  are  shown  in  Table  3.  It  can  be  seen  that  LRX  achieves 
equivalent  performance  to  RX  in  most  images;  ARES  IF  is  an  exception  where  the 
spatially  large  objects  appear  to  confound  the  RX  algorithm,  yet  are  detected  by  LRX. 
With  iterations,  ILRX  was  the  best  perfonning  algorithm  or  tied  with  IRX  in  all  cases, 
except  ARES  2F. 


Table  3:  Training  Data  Results  (TPF  at  FPF  =  0.1) 


Algorithm 

ARES  ID 

ARES  IF 

ARES  2D 

ARES  2F 

ARES  3F 

ARES  4F 

Average 

RX 

0.8673 

0.3410 

0.9933 

0.9455 

0.9535 

0.8649 

0.8276 

IRX 

1.0000 

0.4615 

1.0000 

1.0000 

0.9744 

0.9444 

0.8967 

LRX 

0.9118 

0.7916 

0.9474 

0.7632 

0.8308 

0.7990 

0.8406 

ILRX 

1.0000 

1.0000 

1.0000 

0.9912 

1.0000 

0.9500 

0.9902 

SVDD 

0.9558 

0.9588 

0.9880 

0.9386 

0.9846 

0.8750 

0.9501 

The  corresponding  best  tested  setting  for  each  of  the  algorithms  is  displayed  in 
Table  4.  “Yes”  or  “No”  in  the  NDVI  column  for  SVDD  implies  the  algorithm  achieved 
the  same  results  whether  or  not  NDVI  was  used  in  post  processing.  It  should  be  noted 
that  ILRX  was  the  most  robust  of  the  algorithms  tested.  During  training,  eleven  different 
parameter  settings  realized  the  same  values  shown  in  Table  3.  No  other  algorithm  had 
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multiple  parameter  settings  that  obtained  the  optimal  results.  Since  multiple  setting 
combinations  were  found  for  ILRX  they  were  all  tested  on  the  validation  images. 


Table  4:  Best  Tested  Parameter  Settings  from  Training  Data 


Algorithm 

Number  of 

PCs 

Number 

of  Pixels 

Number  of 

Iterations 

a2 

NDVI 

RX 

9 

232 

1 

- 

Yes 

IRX 

9 

252 

20 

- 

Yes 

LRX 

9 

1*H 

1 

- 

Yes 

ILRX 

10 

2*H 

30 

- 

Yes 

SVDD 

10 

252 

1 

60 

Yes  or  No 

The  results  from  the  validation  images  are  displayed  in  Table  5,  to  include  the 
best  and  worst  tested  parameter  settings  of  eleven  training  combinations  validated  for 
ILRX.  It  can  be  seen  that  ILRX  is  still  the  top  performer  overall  regardless  of  whether 
the  best  or  worst  training  settings  were  implemented.  Furthermore,  the  ILRX  algorithm 
received  the  smallest  drop  in  average  TPF  when  the  settings  were  tested  on  the  validation 
images,  as  compared  to  the  training  images. 


Table  5:  Validation  Data  Results  (TPF  at  FPF  =  0.1) 


Algorithm 

ARES  ID 

ARES  IF 

ARES  2D 

ARES  2F 

ARES  3F 

ARES  4F 

Average 

RX 

0.9016 

0.2075 

0.9920 

0.8282 

0.9545 

0.8864 

0.7950 

IRX 

1.0000 

0.3186 

1.0000 

0.9495 

1.0000 

1.0000 

0.8780 

LRX 

0.9645 

0.4902 

0.9890 

0.8342 

0.9104 

0.7669 

0.8259 

ILRX  (Best) 

1.0000 

0.9449 

1.0000 

0.9741 

1.0000 

1.0000 

0.9865 

ILRX  (Worst) 

1.0000 

0.7394 

1.0000 

0.9646 

1.0000 

0.9744 

0.9464 

SVDD 

0.9180 

0.9850 

0.9983 

0.8641 

0.9330 

0.8200 

0.9197 

Figure  7  shows  the  ROC  curves  for  each  of  the  six  validation  images  comparing 


TPF  to  FPF  using  the  best  tested  settings  for  each  algorithm  from  the  training  images,  as 
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displayed  in  Table  4.  In  every  case,  IRX  performs  better  than  RX  and  ILRX  performs 
better  than  LRX;  hence,  the  comments  below  focus  on  IRX,  ILRX,  and  SYDD. 


- RX 

—  —  —  Iterative  RX 

- Linear  RX 

Iterative  Linear  RX 
-'-.'.SVDD _ 

Figure  7:  ROC  Curves  of  Best  Tested  Parameter  Settings  on  Validation  Images 


IRX  did  well  on  all  of  the  images  except  when  there  are  large  anomalies,  such  as 
the  ones  highlighted  in  ARES  IF.  This  is  because  the  window,  as  it  moves  through  a 
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large  anomaly,  becomes  dominated  by  the  local  anomalous  pixels  rather  than  the  general 
background  of  the  image.  This  defeats  the  purpose  of  the  window,  which  is  to  give  a 
good  estimate  of  the  true  background  of  the  image.  As  a  result,  the  pixel  being  analyzed 
appears  similar  to  the  other  pixels  in  the  window  and  is  not  classified  as  an  anomaly. 
ILRX  mitigates  this  problem  through  its  use  of  a  vertical  line  which  only  contains  a  small 
portion  of  even  a  large  anomaly  and  considerably  more  background  pixels. 

ILRX  had  the  highest  performance  or  was  comparable  with  the  other  detectors  in 
all  of  the  images.  It  had  slight  problems  with  the  rock  fonnations  in  ARES  2F  and  4F 
that  IRX  does  not  detect  due  to  the  window  effect  of  large  images;  however,  this  is 
difficult  to  discern  from  the  ROC  curve  due  to  ILRX  detecting  most  of  the  anomalies  at  a 
relatively  low  threshold. 

SVDD  consistently  performed  better  than  both  of  the  non-iterative  methods, 
however,  it  was  inconsistent  with  regard  to  its  performance  against  the  iterative  methods. 
Also,  the  fact  that  it  is  a  semi-supervised  method  that  randomly  selects  training  data  can 
lead  to  less  than  optimal  performance  from  the  detector.  The  only  image  where  SVDD 
outperformed  the  other  algorithms  is  ARES  IF,  where  IRX  has  trouble  with  large 
anomalies  and  ILRX  has  difficulty  with  vertical  roads. 

Figure  8  shows  the  color  representation  of  the  image  and  the  pixels  classified  as 
anomalies,  or  anomalous  pixel  maps,  for  IRX,  ILRX,  and  SVDD  on  the  validation 
images  ARES  IF  and  4F,  note  that  the  masks  of  IRX  are  smaller  because  it  was  not  used 
to  test  the  borders  of  the  image.  The  anomalous  pixel  maps  were  generated  at  the  first 
knee  in  the  ROC  curve  so  that  they  were  not  overwhelmed  by  false  positive  pixels.  The 
corresponding  TPF  and  FPF  are  displayed  below  each  of  the  images.  It  can  be  easily 
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seen  in  ARES  IF  that  SVDD  is  realizing  superior  results,  primarily  because  IRX  is  not 
locating  the  large  anomalies  and  ILRX  in  addition  to  finding  almost  all  of  the  anomalous 
pixels  is  having  some  difficulties  with  the  roads.  In  ARES  4F  both  of  the  RX  methods 
are  giving  high-quality  results  and  the  SVDD  algorithm  is  getting  inundated  by  the  large 
rock  fonnation. 

The  embellishments  to  RX  follow  a  reasoned  pattern.  IRX  allows  for  the 
exclusion  of  outliers  in  the  local  mean  vector  and  covariance  matrix  calculations,  thereby 
promoting  a  more  accurate  assessment  of  the  target  pixel  (2010).  LRX  mitigates  the 
correlation  difficulties  related  to  RX  by  establishing  the  mean  vector  and  covariance 
matrix  that  is,  on  average,  further  from  each  other  than  the  standard  RX  window.  The 
possible  concern  that  the  reduction  in  the  contribution  to  the  bias  due  to  spatial 
correlation  may  be  offset  by  the  contribution  to  the  bias  due  to  image  non-stationarity 
was  not  realized  here.  We  believe  this  is  due  to  the  following  factors.  If  one  considers 
the  non-stationarity  in  the  image  as  being  characterized  by  distinct  pixel  clusters  then  the 
variation  between  these  clusters  appears  to  be  significantly  less  than  the  variation 
between  the  background  pixels,  in  general,  and  the  target  pixels.  Further,  the  running 
covariance  matrix  estimate  calculated  across  the  background  pixels  appears  to  be  fairly 
robust  to  the  heterogeneity  as  evident  by  the  algorithms  performance.  The  notion  of 
using  separate  estimates  from  the  individual  clusters  is  the  subject  of  current  research. 
Finally,  ILRX  exploits  the  innovations  of  both  IRX  and  LRX.  Taken  together,  these 
innovations  make  ILRX  a  very  competitive  algorithm. 
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ILRX 


TPF:  0.2507 
FPF:  0.0934 

SVDD 


TPF:  0.9411 
FPF:  0.0750 


TPF:  0.9024 
FPF:  0.0169 


ARES  4F 


RGB  Image 


IRX 


ILRX 


TPF:  0.9286 
FPF:  0.0079 
SVDD 


TPF:  0.9565 
FPF:  0.0068 


TPF:  0.7101 
FPF:  0.0546 


Figure  8:  Anomalous  Pixel  Maps 
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2.5  Conclusions 


This  chapter  presented  LRX  and  ILRX,  updates  to  the  newly  introduced  IRX 
algorithm.  Through  experimentation,  the  line  of  pixels  used  by  ILRX  shows  an 
advantage  over  RX  and  IRX  in  that  it  can  help  mitigate  the  deleterious  effects  of 
correlation  due  to  the  spatial  proximity  of  the  pixels  while  the  iterative  adaptation  taken 
from  IRX  simultaneously  eliminates  outliers.  Such  steps  allow  for  the  reduction  of  bias 
and  error  in  the  estimation  of  the  mean  vector  and  covariance  matrix,  thus  accounting  for 
a  portion  of  the  spatial  correlation  inherent  in  HSI  data.  Using  the  HYDICE  images, 
ILRX  has  been  shown  to  be  very  promising  unsupervised  anomaly  detection  algorithm. 
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3  Clustering  Hyperspectral  Imagery  for  Robust  Classification 


3.1  Introduction 

Hyperspectral  Imagery  (HSI)  is  a  method  used  to  collect  contiguous  data  across  a 
large  swath  of  the  electromagnetic  spectrum.  This  is  accomplished  by  using  a  specialized 
camera  mounted  on  an  aircraft  or  satellite  to  record  the  magnitude  of  the  bands  within  the 
collected  wavelengths  of  each  pixel  within  the  area  of  interest.  The  number  of  pixels  in  a 
hyperspectral  image  depends  on  the  resolution  of  the  camera  and  the  size  of  the  area 
being  imaged.  The  number  of  bands  recorded  is  upwards  of  200  or  more  (Shaw  and 
Manolakis,  2002),  and  typically  spans  the  range  from  ultraviolet  to  the  infrared  regions  of 
the  electromagnetic  spectrum.  The  vast  amount  of  data  contained  in  HSI  affords  an 
excellent  opportunity  to  detect  anomalies  using  multivariate  statistical  techniques,  as  each 
material  reflects  individual  wavelengths  of  the  spectrum  differently  (Landgrebe,  2002). 

Target  detection  algorithms  can  be  divided  into  two  groups:  anomaly  detection 
algorithms  and  classification  algorithms.  Anomaly  detection  algorithms  do  not  require 
the  spectral  signatures  of  the  anomalies  they  are  attempting  to  locate.  A  pixel  is  declared 
an  anomaly  if  its  spectral  signature  is  statistically  different  than  the  model  of  the  local  or 
global  background  that  it  is  being  tested  against.  This  implies  these  algorithms  cannot 
distinguish  between  anomalies,  they  only  make  a  decision  on  whether  or  not  a  pixel  is 
anomalous;  hence  the  application  can  be  considered  a  two-class  classification  problem 
(Shaw  and  Manolakis,  2002).  Classification  algorithms  attempt  to  identify  targets  based 
on  their  specific  spectral  signature,  however,  to  accomplish  this  they  require  additional 
information  in  the  form  of  a  spectral  library  (Manolakis  et  ah,  2009). 
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Ei smarm  et  al.  (2009)  claim  that  the  mean  vector  and  covariance  matrix  required 

for  anomaly  classification  can  be  estimated  globally  from  the  entire  image  data  under  the 

assumption  that  there  are  a  small  number  of  anomalies  in  the  image  and  this  has  an 

insignificant  effect  on  the  covariance  matrix.  This  statement  is  contested  in  Smetek 

(2007),  where  potential  ill  effects  of  a  small  number  of  anomalies  on  the  estimation  of  the 

covariance  matrix  are  detailed.  Similarly,  Manolakis  et  al.  (2009)  state  that: 

Possible  presence  of  targets  in  the  background  estimation  data  lead  to  the 
corruption  of  background  covariance  matrix  by  target  spectra.  This  may 
lead  to  significant  performance  degradation;  therefore,  it  is  extremely 
important  that,  the  estimation  of  p  and  X  should  be  done  using  a  set  of 
“target-free”  pixels  that  accurately  characterize  the  background.  Some 
approaches  to  attain  this  objective  include:  (a)  run  a  detection  algorithm, 
remove  a  set  of  pixels  that  score  high,  recompute  the  covariance  with  the 
remaining  pixels,  and  “re-run”  the  detection  algorithm,  and  (b)  before 
computing  the  covariance,  remove  the  pixels  with  high  projections  onto 
the  target  subspace.  (Manolakis  et  al.,  2009) 

This  research  demonstrates  that  classification  algorithms,  such  as  the  Adaptive 
Matched  Filter  (AMF),  may  be  improved  by  addressing  correlation  and  homogeneity 
problems  inherent  to  HSI  that  are  often  ignored  in  practice.  We  begin  by  showing  the 
benefit  of  using  an  anomaly  detector  to  remove  potential  anomalies  from  the  mean  vector 
and  covariance  matrix  statistics,  as  suggested  by  Manolakis  et  al.  (2009).  In  addition,  we 
show  further  benefits  by  addressing  the  non-homogeneity  of  HSI  through  the  use  of 
cluster  analysis  prior  to  classification. 

The  remainder  of  this  chapter  is  organized  as  follows.  Section  2  reviews  the  basics 
of  classification,  describes  the  Adaptive  Matched  Filter  (AMF)  as  well  as  AMF  variants 
used  in  this  research,  and  discusses  atmospheric  compensation.  Section  3  briefly  outlines 
the  seven  anomaly  detectors  implemented.  Section  4  discusses  clustering,  more 
specifically,  the  X-means  algorithm.  Section  5  details  the  methodology  implemented. 
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Section  6  presents  the  results  of  the  experiments.  Finally,  Section  7  concludes  the 
chapter. 

3.2  Classification 

This  section  describes  the  AMF  classification  algorithms  used  in  this  research. 
Three  new  variants  to  the  AMF  are  introduced  that  have  the  ability  to  classify  with 
improved  accuracy  by  addressing  correlation  and  homogeneity  problems  inherent  to  HSI. 
Elementary  atmospheric  compensation  is  also  discussed,  detailing  a  method  to  transform 
a  spectral  library  into  the  image  space  to  allow  for  proper  classification. 


3.2.1  Classification  Algorithms 

The  goal  of  a  HSI  statistical  classification  algorithm  is  to  determine  whether  or 
not  a  test  pixel  is  likely  made  of  the  same  material  as  a  target  pixel.  Define  the 
conditional  probability  density  of  the  test  pixel,  x,  as  realized  under  the  alternative 
hypothesis,  Ha  (same  as  the  target  pixel),  as  fa  (x  |  Ha ) ,  and  the  conditional  probability 
density  of  the  test  pixel,  x,  as  realized  under  the  null  hypothesis,  Ho  (not  the  same  as  the 
target  pixel),  as  f0  (x  |  H0 ) .  The  corresponding  likelihood  ratio  is 


F(x) 


f«(X\Ho) 


(7) 


If  F(x)  is  greater  than  the  user  defined  threshold,  t,  then  the  null  hypothesis  is  rejected, 
meaning  the  test  pixel  is  considered  a  target  pixel;  otherwise,  the  null  hypothesis  cannot 
be  rejected,  implying  the  test  pixel  is  not  considered  a  target  pixel  (Manalokis  et  ah, 
2007).  That  is  if  F(x)  >  t  the  test  pixel  is  considered  a  target  pixel  or  if  F(x)  <  t  the 
test  pixel  is  not  considered  a  target  pixel. 
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The  classification  algorithm  utilized  in  this  research  is  the  full-pixel  AMF  as 
defined  by  Manolakis  and  Shaw  (2002).  The  algorithm  assumes  the  target  spectra  and 
background  spectra  have  a  common  covariance  matrix,  2,  and  is  defined  by 


AMF  = 


(8) 


Additionally,  it  is  assumed  that  the  global  mean  is  removed  from  the  estimate  of  target 
spectral  signature,  5,  and  test  pixel  spectral  signature,  x.  The  spectral  signature  of  the 
target  of  interest  is  a  fixed  1  xp  vector  determined  from  a  spectral  library  or  the  mean  of 
a  sample  of  known  target  pixels  collected  under  the  same  conditions  (Manalokis  et  al., 
2009). 

3.2.2  Variants  of  the  AMF 

The  standard  AMF  and  three  variants  of  the  AMF  are  implemented  in  this 
research.  The  first  method  is  the  standard  AMF  as  described  above,  where  the  mean 
vector  and  covariance  matrix  are  taken  from  the  entire  image.  The  first  variant,  to  be 
called  Robust  AMF,  is  suggested  by  the  quote  from  Manolakis  et  al.  (2009)  in  the 
introduction  of  the  chapter.  In  this  method,  an  anomaly  detector  first  analyzes  the  image, 
then  the  mean  vector  and  covariance  matrix  are  estimated  from  the  image  without  the 
detected  anomalies.  The  second  variant  is  referred  to  as  Clustered  AMF.  In  this  method 
anomalies  are  removed  as  in  Robust  AMF,  next  the  image  is  clustered  without  the 
detected  anomalies.  Each  of  the  clusters  yields  a  mean  vector  and  covariance  matrix 
estimate.  The  corresponding  background  statistics  for  the  pixels  to  be  classified  are 
determined  through  the  modal  class  of  its  neighbors.  A  similar  idea  has  been  proposed 
with  anomaly  detection  (Stein  et  al.,  2000).  Due  to  the  time-consuming  nature  of 
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determining  which  cluster  a  pixel  is  located  in,  a  third  variant  is  developed  and  called 
Largest  Cluster  AMF.  This  method  removes  the  anomalies  and  clusters  the  resulting  data 
as  is  done  in  Clustered  AMF;  however,  the  mean  vector  and  covariance  matrix  for  the 
pixels  to  be  classified  are  estimated  from  the  single  largest  cluster  of  data  in  the  image. 

3.2.3  Atmospheric  Compensation 

Spectral  signature  matching  within  HSI  typically  incorporates  a  spectral  library 
consisting  of  ground  measured  reflectance  data  from  objects  of  interest.  The  difficulty 
with  spectral  signature  matching  is  that  hyperspectral  images  are  collected  using  a  sensor 
that  collects  pupil-plane  radiance,  which  includes  reflected  and  radiated  energy  as  well  as 
atmospheric  distortions.  Before  spectral  signatures  from  an  image  can  be  compared  to 
target  signatures,  atmospheric  compensation  must  be  performed  to  bring  the  spectral 
library  from  the  reflectance  space  to  the  pupil-plane  radiance  space.  Since  radiance  data 
is  a  function  of  atmospheric  conditions,  which  vary  greatly  by  collection  time,  the 
spectral  library  must  be  processed  with  each  image  separately  (Eismann,  2012). 

Linear  and  model  based  approaches  are  available  to  transform  data  from  the 
reflectance  space  to  the  radiance  space.  Model  based  approaches,  such  as  MODTRAN 
(Berk  et  ah,  1999),  require  prior  knowledge  about  the  scene  collection.  Linear  methods 
assume  that  atmospheric  content  is  a  linear  addition  where  the  pupil-plane  radiance  is  a 
function  of  reflectance  with  a  scaling  multiplier  and  offset 

Li=api+b,  (9) 

where  pt  is  a  reflectance  signature  to  be  transformed  into  the  L  radiance  space  with 
gain,  a ,  and  offset,  b,  as  calculated  by 
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a  = 


(10) 


^2  A 

Pi  —  A 

b  =  h Pi-LiPx  ;  (11) 

A  —  A 

where  /?,  and  p2  are  known  reflectance  signatures  from  the  spectral  library,  and  Lx  and 
L2  are  the  corresponding  radiance  measurements  from  the  scene.  Linear  methods  are 

comprised  of  two  general  types,  methods  such  as  the  Empirical  Line  Method  (ELM), 
which  require  known  objects  of  interest  to  be  within  the  spectral  library  and  located 
within  the  image;  and  vegetation  nonnalization  methods,  which  use  expected  radiance 
and  reflectance  of  vegetation  in  place  of  specific  known  objects.  Both  permit 
atmospheric  compensation  to  be  conducted  for  the  remaining  objects  in  the  spectral 
library  (Eismann,  2012). 

For  situations  lacking  prior  knowledge  of  scene  content,  methods  such  as 
vegetation  normalization  are  appropriate,  where  the  linear  method  in  equation  (9)  is 
applied  with  radiance  measurements  for  materials  expected  in  the  scene.  The  Normalized 
Difference  Vegetation  Index  (NDVI)  and  the  Bare  Soil  Index  (BI)  are  two  methods  that 
allow  atmospheric  compensation  to  be  performed  depending  on  the  scene  landcover 
(Eismann,  2012).  The  images  used  in  this  research  come  from  the  Hyperspectral  Digital 
Imagery  Collection  Equipment  (HYDICE)  (Rickard  et  ah,  1993)  sensor  for  the  Forest 
Radiance  I  and  Desert  Radiance  II  collection  events  in  1995.  The  images  were  collected 
with  210  bands  between  397  nm  and  2,500  nm  and  the  ground  reflectance  data  was 
collected  with  430  bands  between  350nm  and  2,500  nm.  The  collection  names  allude  to 
images  consisting  of  forest  and  desert  scenes.  Atmospheric  compensation  was  perfonned 
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with  NDVI  for  forest  images,  and  BI  was  applied  for  the  desert  images  due  to  the  lack  of 
vegetation. 

3.2.3. 1  Normalized  Difference  Vegetation  Index  (NDVI) 

NDVI  (Rouse  et  ah,  1973)  is  a  method  that  is  used  to  detennine  whether  a  pixel 
within  a  hyperspectral  image  is  green  vegetation.  It  does  this  by  comparing  the  radiance 
of  the  Near  Infrared  (NIR)  spectrum  to  the  red  spectrum 


NDVI  = 


NIR -red 
NIR  +  red 


(12) 


where  red  corresponds  to  the  600  -  700  mn  bands  and  NIR  corresponds  to  the  700  - 
1,000  mn  near  infrared  bands  (Eismann,  2012).  In  this  research,  we  used  bands 
corresponding  to  660  mn  for  red  and  860  mn  for  NIR. 

NDVI  is  calculated  for  each  pixel  within  an  image,  and  pixels  with  the  highest 
scores  can  be  used  as  vegetation  within  the  radiance  space.  Hence,  the  vegetation  in  the 
spectral  library  can  be  used  as  p2  in  equations  (10)  and  (11),  and  the  average  spectral 


signature  of  the  pixels  with  the  highest  NDVI  score  can  be  used  as  L2 .  Lx  can  be 


determined  from  the  shadows  within  an  image  which  can  be  estimated  by  the  spectral 
signature,  which  is  calculated  by  taking  the  minimum  value  from  each  band  in  the  image 
across  all  pixels.  Finally,  px  is  set  as  a  vector  of  zeros,  and  interpreted  as  the  ideal 


minimum  radiance  in  the  image  (Eismann,  2012). 

3.2.3.2  Bare  Soil  Index  (BI) 

BI  (Chen  et  ah,  2004)  is  a  method  similar  to  NDVI,  however,  it  is  designed  for 
bare  soil  within  a  hyperspectral  image.  It  can  be  employed  in  the  same  fashion  as  NDVI 
assuming  there  is  a  soil  measurement  within  the  spectral  library 
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_  (SWIR  +  red)  -  (NIR  +  blue) 
~  (SWIR  +  red)  +  (NIR  +  blue)  ’ 


(13) 


where  blue  corresponds  to  the  450  -  500  nm  bands,  red  corresponds  to  the  600  -  700  mn 
bands,  NIR  corresponds  to  the  700  -  1,000  nm  bands,  and  Short  Wave  Infrared  (SWIR) 
corresponds  to  the  1,150  -  2,500  nm  short-wave  infrared  bands  (Eismann,  2012).  In  this 
research,  we  used  the  band  corresponding  to  470  nm  for  blue,  660  nm  for  red,  860  nm  for 
NIR,  and  2,280  nm  for  SWIR. 

3.3  Anomaly  Detection 

To  employ  an  anomaly  detection  algorithm  to  hyperspectral  data,  first  the 
atmospheric  absorption  bands  should  be  removed  and  the  data  cube  must  be  reshaped 
into  a  data  matrix.  The  removal  of  the  absorptions  bands  in  the  images  employed  in  this 
research  results  in  the  retention  of  145  of  the  210  original  bands.  HSI  data  is  typically 
stored  in  a  three-dimensional  matrix,  referred  to  as  an  image  cube  or  data  cube,  with  the 
first  two  dimensions  of  the  matrix  being  the  location  of  the  pixel  in  the  image  and  the 
third  dimension  being  the  magnitude  at  each  of  the  recorded  electromagnetic  bands. 
Therefore,  it  can  be  viewed  as  a  stack  of  images  with  each  image  representing  the 
intensity  of  a  given  band.  An  x  p  data  matrix  is  generated  by  reshaping  the  data  cube 
into  a  matrix  with  the  first  dimension  containing  all  n  pixels  in  the  image  and  the  second 
dimension  containing  all p  bands.  After  the  data  is  in  the  proper  fonn,  Principal 
Component  Analysis  (PCA)  (Landgrebe,  2003)  is  employed  as  a  data  reduction  tool.  In 
all  of  the  algorithms  except  AutoGAD  the  user  is  left  to  determine  the  number  of 
Principal  Components  (PCs)  to  retain  in  order  to  reduce  the  dimensionality  of  the  data. 
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3.3.1  RX  Detector 


The  RX  algorithm  was  developed  by  Reed  and  Yu  (1990).  It  detects  anomalies 
through  the  use  of  a  moving  window.  The  pixel  in  the  center  of  the  window  is  scored 
against  the  other  pixels  in  the  window.  The  window  is  then  shifted  by  one  pixel  and  the 
process  is  repeated  until  each  pixel,  x,  has  received  an  RX  (x)  score  based  on 


RX(x)  =  (x-  //)7 


N 


-i-i 


N  +  \ 


1+ 


N  +  \ 


(x  -  ju)(x  -  jU)1 


(x  -  JU)  . 


(14) 


where  N  is  the  number  of  pixels  in  the  window  and  p  and  X  are  the  estimated  mean  and 
covariance  matrix  of  the  data  within  the  window.  Pixels  are  considered  anomalous  if 
their  RX  score  is  greater  than  a  chi-squared  distribution  with  corresponding  significance 
level,  a,  and  N-  1  degrees  of  freedom. 


3.3.2  Iterative  RX  (IRX)  Detector 

The  Iterative  RX  (IRX)  detector  was  introduced  by  Taitano  et  al.  (2010)  as  an 
extension  to  the  RX  detector  in  an  attempt  to  mitigate  the  effects  that  anomalies  have  on 
mean  vector  and  covariance  matrix  calculations.  The  algorithm  runs  RX  in  an  iterative 
fashion,  each  time  removing  pixels  flagged  in  the  previous  iteration  as  anomalous  from 
the  mean  vector  and  covariance  statistics  used  to  calculated  the  RX  scores.  This  process 
continues  until  the  set  of  anomalies  in  the  previous  iteration  matches  the  set  from  the 
current  iteration  or  the  maximum  number  of  iterations  has  been  completed  (Taitano  et  al., 
2010). 
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3.3.3  Linear  RX  (LRX)  and  Iterative  Linear  RX  (ILRX)  Detectors 

The  Linear  RX  (LRX)  and  Iterative  Linear  RX  (ILRX)  detectors  (Williams  et  al., 
2012)  function  in  the  same  manner  as  the  RX  and  IRX  except  a  vertical  line  of  data  is 
used  as  opposed  to  a  window.  If  the  number  of  pixels  selected  for  the  line  size  is  larger 
than  the  image,  then  pixels  are  taken  from  the  bottom  of  the  previous  column  and  the  top 
of  the  subsequent  column.  These  methods  are  advantageous  over  the  previously 
described  methods  because  they  increase  the  average  distance  between  the  test  pixel  and 
the  pixels  used  to  estimate  the  background  statistics,  thereby  decreasing  the  effects  of 
correlation  due  to  spatial  proximity  (Williams  et  al.,  2012). 

3.3.4  Autonomous  Global  Anomaly  Detector  (AutoGAD) 

The  Autonomous  Global  Anomaly  Detector  (AutoGAD)  (Johnson  et  al.,  2012)  is 
an  Independent  Component  Analysis  (ICA)  (Hyvarinen  et  al.,  2001;  Stone,  2004)  based 
detector  that  is  processed  in  four  phases.  Phase  I  reduces  the  dimensionality  of  the  data 
through  PCA  (Landgrebe,  2003),  using  the  geometry  of  the  eigenvalue  curve  to 
determine  the  number  of  PCs  to  retain.  Phase  II  conducts  ICA  on  the  retained  PCs  from 
Phase  I  via  the  FastICA  algorithm  (Hyvarinen,  1999).  Phase  III  determines  the 
Independent  Components  (ICs)  that  potentially  contain  anomalies  using  two  filters:  the 
potential  anomaly  signal  to  noise  ratio  and  the  maximum  pixel  score.  Phase  IV  smooths 
the  background  noise  in  the  ICs  selected  in  Phase  III  using  an  adaptive  Wiener  filter 
(Lim,  1990)  in  an  iterative  fashion  then  locates  the  potential  anomalies  using  the  Chiang 
et  al.  (2001)  zero  bin  method. 
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3.3.5  Support  Vector  Data  Description  (SVDD) 


Support  Vector  Data  Description  (SVDD)  was  originally  applied  to  HSI  data  by 
Banerjee  et  al.  (2006;  2007).  SVDD  is  a  semi-supervised  algorithm  that  requires  a 
training  set  of  background  data.  Since  HSI  images  are  usually  assumed  to  contain  few 
anomalies,  the  training  set  is  generated  by  randomly  selecting  pixels  from  the  image.  The 
minimum  volume  hypersphere  about  the  training  set,  S ( R,a )  =  \x :  ||jc  -  a||  <  R  \ ,  is  then 

calculated  with  center  a  and  radius  R.  The  hypersphere  is  determined  through 
constrained  Lagrangian  optimization  that  simplifies  to 

SVDD(y)  =  R2- K(y,y )  +  2^«,^(y,x,) ,  (15) 

i 

where  K(x,y)  is  the  kernel  mapping  defined  by 

A(x,y)  =  exp^-||x-_y||“  / cr2  j ,  (16) 

and  y  is  the  pixel  of  interest,  and  (Xj  are  the  weights  or  support  vectors,  and  a2  is  a  radial 
basis  function  parameter  used  to  scale  the  size  of  the  hypersphere.  Finally,  pixels  that 
have  a  SVDD  score  larger  than  a  user  defined  threshold  are  considered  anomalies 
(Banerjee  et  al.,  2006;  2007). 

3.3.6  Blocked  Adaptive  Computationally  Efficient  Outlier  Nominators  (BACON) 

Blocked  Adaptive  Computationally  Efficient  Outlier  Nominators  (BACON)  is  a 
statistical  outlier  detector  created  by  Billor  et  al.  (2000).  It  attempts  to  locate  outliers  in  a 
data  set  through  the  use  of  iterative  estimates  of  the  model  with  a  robust  starting  point. 
The  algorithm  is  computationally  efficient,  regularly  requiring  less  than  five  iterations  to 
converge,  so  it  is  applicable  to  HSI  data.  The  basic  idea  is  to  start  with  a  small  subset  of 
outlier  free  data  and  iteratively  add  blocks  of  data  to  the  data  set  until  all  data  points  not 
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considered  outliers  are  in  the  data  set.  The  final  data  set  is  then  assumed  to  be  outlier  free 


and  thus  can  be  used  to  generate  robust  mean  vector  and  covariance  matrix  estimates 
(Billor  et  ah,  2000). 

The  BACON  algorithm  begins  by  selecting  an  initial  basic  subset  of  data  with 
m  =  cp  data  points  with  the  smallest  Mahalanobis  distance,  where  in  the  case  of  HSI  p  is 
equal  to  the  number  of  bands  within  image  and  c  =  4  or  5,  as  suggested  by  Billor  et  al. 
(2000),  as  long  as  m  >  n  where  n  is  the  number  of  pixels  in  the  image.  Next,  the 
Mahalanobis  distances  are  calculated  for  each  of  the  pixels;  remembering  u  and  X  are 
now  the  mean  vector  and  covariance  matrix  of  the  basic  subset.  A  new  basic  subset  of  all 
of  the  pixels  with  distances  less  than  cnprx^a/2  is  selected,  where  zl,a/n  's  the  1  -  ( a/n ) 
significance  level  of  a  chi-squared  distribution  with  p  degrees  of  freedom  and 
c  =  cnp  +  chr  is  the  correction  factor  where 


p  +  \  2 

Cnp  ~  1  +  +  ,  o  ’ 

n  -  p  n  - 1  -  3  p 

(17) 

-~K+rj}' 

(18) 

h_{n  +  p  +  l) 

(19) 

Here  r  is  the  size  of  the  current  basic  subset,  n  is  the  number  of  observations,  or  pixels, 
and  p  is  the  dimensionality  of  the  data,  or  bands.  If  the  size  of  the  new  basic  subset  is  the 
same  size  of  the  basic  subset  from  the  previous  iteration,  the  algorithm  is  tenninated. 
Otherwise,  a  new  basic  subset  is  calculated  (Billor  et  al.,  2000). 
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3.4  Clustering 


Cluster  analysis  is  a  multivariate  analysis  technique  for  grouping,  or  clustering,  a 
dataset  into  smaller  subsets  known  as  clusters.  The  goal  of  cluster  analysis  is  to 
maximize  the  between-cluster  variation  while  minimizing  intra-cluster  variation  (Dillon 
and  Goldstein,  1984).  K-means  is  a  clustering  technique  where  the  data  is  split  into  k 
user-defined  number  of  clusters.  The  K-means  algorithm  is  initialized  by  randomly 
selecting  k  starting  points,  or  cluster  centers.  Next,  a  random  data  point  is  selected  and 
added  to  the  nearest  cluster.  The  corresponding  cluster  center  is  then  updated  with  the 
new  data,  allowing  for  currently  clustered  data  to  move  into  other  clusters.  This  process 
is  repeated  until  all  of  the  data  points  are  in  one  of  the  k  clusters  and  no  data  points  are 
moved  in  an  iteration  of  the  algorithm  (Dillon  and  Goldstein,  1984). 

The  main  difficulty  when  using  a  clustering  algorithm  such  as  K-means  is 
selecting  the  k,  the  number  of  clusters.  X-means  (Pelleg  and  Moore,  2000)  is  a  clustering 
algorithm  based  upon  K-means  that  has  the  ability  to  select  the  number  of  clusters.  This 
is  accomplished  by  running  K-means  multiple  times,  splitting  each  of  the  original  clusters 
in  two,  and  scoring  each  possible  subset  of  full  and  partial  clusters  using  the  Bayesian 
Infonnation  Criterion  (BIC)  to  detennine  the  optimal  clustering  (Pelleg  and  Moore, 

2000). 

Rather  than  supplying  the  X-means  algorithm  the  specific  number  of  clusters  as  in 
K-means,  the  user  defines  a  range  of  possible  clusters,  k-lower  and  k-upper.  The 
algorithm  begins  by  running  K-means  on  the  data  set  with  k  equal  to  k-lower.  Next,  each 
of  the  original  k  clusters  are  split  in  two  using  K-means  with  k  equal  to  two.  Then  all  2/ 
possible  combinations  of  whole  clusters  and  split  clusters  are  analyzed  for  the 
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corresponding  BIC  scores.  Then  k  is  incremented  and  the  process  is  repeated  until  k- 
upper  has  been  analyzed.  The  algorithm  then  returns  the  k  cluster  centers  with  the 
highest  corresponding  BIC  score  and  K-means  is  run  one  last  time  using  the  returned 
cluster  centers  as  the  initial  starting  points  (Pelleg  and  Moore,  2000). 

3.5  Methodology 

Nine  HYDICE  (Rickard  et  ah,  1993)  hyperspectral  images  were  employed  in  this 
research,  six  forest  images  and  three  desert  images,  as  shown  in  Figure  9  with  image 
details  displayed  in  Table  6.  The  first  step  was  to  analyze  an  image  using  one  of  the 
seven  anomaly  detectors  described  in  Section  3:  RX,  IRX,  LRX,  ILRX,  AutoGAD, 
SVDD,  or  BACON.  Each  algorithm  has  user  defined  settings  which  influence  the 
algorithms’  performance.  In  these  experiments,  the  RX  detectors  and  SVDD  used  the 
best  settings  as  reflected  in  Williams  et  al.  (2012),  the  settings  for  AutoGAD  were  taken 
from  Johnson  et  al.  (2012),  and  the  settings  for  BACON  were  taken  from  Billor  et  al. 
(2000).  The  anomalies  detected  by  the  anomaly  detector  were  used  twice:  first,  the 
anomalies  were  removed  from  the  image  to  calculate  a  robust  mean  vector  and 
covariance  matrix,  and  second,  the  anomalous  pixels  served  as  the  test  pixels  to  be 
classified  using  one  of  the  four  AMF  variants  described  in  Section  2.B:  standard  AMF, 
Robust  AMF,  Clustered  AMF,  and  Largest  Cluster  AMF. 
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Table  6:  Hyperspectral  Image  Data 


Image 

Size 

Total  Pixels 

Anomalous  Pixels 

Anomalies 

Unique  Targets 

IF 

191  x  160 

30,560 

994 

10 

5 

2F 

312  x  152 

47,424 

281 

30 

9 

3F 

226  x  136 

30,736 

96 

20 

11 

4F 

205  x  80 

16,400 

75 

29 

12 

5F 

470  x  156 

73,320 

440 

15 

20 

6F 

355  x  150 

53,250 

976 

45 

10 

ID 

215  x  104 

22,360 

490 

46 

22 

2D 

156  x  156 

24,336 

417 

4 

3 

3D 

460  x  78 

35,880 

405 

12 

12 

The  following  steps  were  performed  to  classify  anomalies.  First,  the  spectral 
library,  consisting  of  30  objects  for  forest  images  and  34  objects  for  desert  images,  was 
transformed  from  radiance  space  to  reflectance  space  using  NDVI  or  BI  atmospheric 
compensation  depending  on  the  image  scene,  as  depicted  in  the  bottom  Figure  10.  Next, 
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one  of  the  seven  anomaly  detectors  then  analyzed  the  image  for  anomalous  pixels,  as 
shown  in  the  top  Figure  10.  Pixels  below  the  anomaly  detector  threshold  (T i)  are 
classified  as  background.  Pixels  above  T i  are  classified  as  anomalies  and  are  used  as  the 
pixels  to  be  classified  by  the  AMFs.  The  mean  vector  (p)  and  covariance  matrix  (£)  are 
estimated  using  the  appropriate  set  of  data  for  the  AMF  variant.  The  standard  AMF  uses 
p  and  X  from  the  entire  image,  the  robust  AMF  uses  p  and  X  from  the  image  without  the 
detected  anomalies,  and  the  Clustered  AMF  and  Largest  Cluster  AMF  use  p  and  X  from 
the  appropriate  cluster  of  data,  as  detennined  by  the  X-means  algorithm.  Finally,  the  data 
is  processed  through  the  classifier  and  pixels  below  the  classifier  threshold  (T2)  are 
classified  as  background  and  pixels  above  T2  are  classified  as  appropriate  target  types. 


Figure  10:  Classification  Experimental  Process  Graph 


In  order  to  generate  Receiver  Operating  Characteristic  (ROC)  like  curves 
(Fawcett,  2006)  for  each  AMF/anomaly  detector  pair  across  all  nine  test  images,  the 
following  methodology  was  employed.  Each  anomaly  detector  was  run  across  a  range  of 
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anomaly  detector  thresholds  ( T\ '),  i  =  1,  2,  19  where  T\  =  0.01,  T\+x  =  T\  +  Ai,  and 

Ai  =  0.005.  The  pixels  flagged  as  anomalous  are  then  processed  through  the  four  AMFs. 
Each  target  in  the  spectral  library  is  compared  to  a  pixel  of  interest,  and  the  target  type 
with  the  largest  resulting  AMF  score  is  declared  given  its  score  is  above  the  AMF 
threshold  (7V),  j  =  1,2,...,  100  where  T2  =  0.01,  F2J+1  =  7V  +  A2,  and  A2  =  0.01.  The 
thresholding  implemented  in  this  research  is  created  by  finding  the  range  of  all  of  the 
AMF  scores  corresponding  to  the  target  type  with  the  largest  resulting  AMF  score  and 
nonnalizing  them  between  zero  and  one  to  allow  for  consistency  amongst  the  different 
AMFs  within  an  image.  When  a  threshold  is  selected,  any  pixel  with  an  AMF  score  less 
than  the  threshold  is  classified  as  background.  Analyzing  each  of  the  different  detector 
thresholds,  across  all  four  of  the  AMFs,  while  allowing  77J  to  vary  across  the  entire  range 
of  possible  values  enumerates  all  combinations  of  T[  and  77 . 

As  the  images  are  processed,  a  2  x  3  confusion  matrix,  as  displayed  in  Figure  11, 
was  generated  for  each  T!  ,  77 ,  image  combination,  denoted  C  ,  (77),  where  i  and  j 

represent  specific  threshold  values  and  k  is  the  image  of  interest.  The  confusion  matrix  is 
comprised  of  two  sections  to  allow  for  the  scoring  of  every  pixel  in  the  image.  The 
anomaly  detector  section  reflects  the  declaration  of  a  test  pixel  as  background  or  the 
passing  of  the  pixel  to  be  classified.  Relative  to  the  detector  declaration  of  background, 
there  are  False  Negatives  (FN)  and  True  Negatives  (TN).  The  anomaly  classifier  section 
then  reflects  the  ultimate  classification  of  the  pixels  classified  as  anomalous  by  the 
anomaly  detector. 
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Figure  11:  2x3  Confusion  Matrix 


After  each  of  the  nine  images  are  processed,  new  2^3  confusion  matrices  are 

generated  which  consist  of  the  sum  across  all  nine  images  for  each  Ck,  ,  combination 

7]  T2 


defined  by 


(*)■ 


k= 1 


(20) 


where  /  is  the  number  of  images  analyzed,  here  1=9.  Below,  the  true  positive  fraction 
(classifier)  versus  false  positive  fraction  (classifier)  data  point  for  each  Cr,rJ  was  plotted 

for  a  given  anomaly  detector,  anomaly  classifier  combination  and  the  frontier  of  the 
resulting  data  is  interpreted  as  a  rough  ROC  curve,  as  is  shown,  notionally,  in  Figure  12 
where  the  circles  are  data  points  and  the  line  represents  the  ROC  curve. 
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FPF 


Figure  12:  ROC  Curve  Generated  from  the  Frontier  of  the  Data 


3.6  Results 

The  four  AMFs  and  the  seven  anomaly  detectors  were  scored  to  produce  ROC 
curves  as  described  in  Section  5,  with  the  results  shown  in  Figure  13  and  Figure  14.  Each 
figure  contains  separate  graphs  of  classification  performance  generated  from  each 
anomaly  detector.  Figure  13  shows  the  ROC  curves  for  the  full  process  (detection  plus 
classification).  This  means  taking  into  account  the  full  2x3  confusion  matrix,  implying 

TP 

TPF  = - £ - ,  (21) 

tpc  +  fnc+fnd 

FP 

FPF  = - F -  (22) 

FPc  +  TNC  +  TNd 

where  the  subscripts  C  and  D  denote  classifier  and  detector,  respectively.  Note  the  low 
TPF  and  extremely  small  FPF  come  from  the  fact  that  nine  images  with  a  total  of  334,226 
pixels  were  analyzed  in  the  process  and  these  statistics  are  biased  downward  by  a  large 
number  of  FND  and  TND.  The  key  insight  to  be  gained  from  these  ROC  curves  is  that  in 
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all  cases  the  variants  outperform  the  standard  AMF.  Furthermore,  the  clustering  methods 
enhance  the  Robust  AMF. 


To  get  a  better  visualization  of  the  data,  a  set  of  conditional  ROC  curves  were 
created.  Figure  14  displays  the  ROC  curves  which  account  for  the  perfonnance  of  the 
AMF  given  detections,  implying 


TPF  = 


TPC 

TPC  +  FNC  ’ 


(23) 


FPF  = 


FPC 

FPC  +  TNC  ' 


(24) 


Again,  we  see  the  variants  outperform  the  standard  AMF  and  in  most  cases  the  clustering 
methods  enhance  the  Robust  AMF. 
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Figure  13:  ROC  Curves  for  Full  Process 
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3.7  Conclusions 


This  research  demonstrates  improvements  to  Adaptive  Matched  Filter  (AMF) 
performance  by  addressing  correlation  and  non-homogeneity  problems  inherent  to  HSI 
data,  which  are  often  ignored  when  classifying  anomalies.  The  standard  AMF  and  three 
variants  were  employed  along  with  seven  different  anomaly  detectors  utilized  prior  to 
classification.  Manolakis  et  al.  (2009)  state  the  estimation  of  the  mean  vector  and 
covariance  matrix  should  be  calculated  using  “target- free”  data,  generating  a  Robust 
AMF.  Through  the  use  of  prior  anomaly  detection  the  Robust  AMF  showed  improved 
performance  over  the  standard  AMF.  Additionally,  classification  was  further  enhanced 
by  simultaneously  addressing  the  non-homogeneity  of  HSI  data  by  selecting  the  required 
statistics  from  the  appropriate  cluster  of  “target- free”  data. 
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4  Further  Extensions  to  Robust  Parameter  Design:  Three  Factor  Interactions 


with  an  Application  to  Hyperspectral  Imagery 


4.1  Introduction 

Hyperspectral  Imagery  (HSI)  is  a  method  of  collecting  vast  amounts  of  information 
from  the  electromagnetic  spectrum.  In  principle,  a  HSI  sensor  is  similar  to  a  standard 
digital  camera.  However,  instead  of  the  three  wavelengths,  or  bands,  collected  by  a 
digital  camera,  a  HSI  sensor  collects  upwards  of  250  different  bands.  These  bands 
typically  span  the  visible  spectrum  up  through  parts  of  the  near-infrared  spectrum  (Shaw 
and  Manolakis,  2002). 

When  an  object  is  covered  with  camouflage  netting  it  can  be  difficult  to 
distinguish  in  a  photograph.  However,  since  distinct  objects  reflect  electromagnetic 
energy  differently,  the  same  hidden  object  could  be  visible  within  specific  bands  beyond 
those  included  in  a  color  image.  This  potential  provides  opportunities  for  locating 
unusual  objects  within  HSI  (Landgrebe,  2003).  This  is  typically  accomplished  by  one  of 
two  methods:  local  or  global  anomaly  detection.  A  local  anomaly  detector  uses  a 
window  that  moves  through  the  image.  The  pixel  at  the  center  of  the  window  is  tested 
against  the  other  pixels  in  the  window  to  determine  if  it  is  anomalous  (Stein  et  ah,  2002). 
The  classic  example  of  a  local  anomaly  detector  is  the  RX  detector  (Reed  and  Yu,  1990); 
others  include  the  Adaptive  Causal  Anomaly  Detection  (Hsueh  and  Chang,  2004),  the 
Multi-Window  Anomaly  detector  (Liu  and  Chang,  2008),  the  Iterative  RX  detector 
(Taitano  et  ah,  2010)  and  Iterative  Linear  RX  detector  (Williams  et  ah,  2012;  2010).  A 
global  anomaly  detector  uses  all  of  the  data  in  the  image  to  determine  a  model  of  the 
background,  assuming  sparse  anomalies.  The  individual  pixels  in  the  image  are  then 
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tested  against  this  model  to  determine  if  they  are  anomalous  (Stein  et  ah,  2002). 

Examples  of  a  global  anomaly  detector  include  Joint  Subspace  Detection  (JSD)  (Schaum, 
2004),  Support  Vector  Data  Description  (SVDD)  (Banerjee  et  ah,  2006),  and  the 
Autonomous  Global  Anomaly  Detector  (AutoGAD)  (Johnson  et  ah,  2012). 

An  anomaly  detector  such  as  AutoGAD  requires  the  user  to  provide  various 
parameters  and  thresholds  in  order  to  analyze  an  image.  These  user-defined  settings  can 
be  considered  controllable  factors,  or  control  variables,  and  potentially  have  a  large  effect 
on  algorithm  performance.  Mindrup  et  al.  (2010)  proposed  that  certain  attributes  of 
hyperspectral  images  could  be  thought  of  as  uncontrollable  factors,  or  noise  variables. 

The  presence  of  controllable  and  uncontrollable  factors  in  HSI  anomaly  detection 
algorithms  suggests  the  use  of  Robust  Parameter  Design  (RPD)  to  locate  the  best  settings 
for  the  algorithm.  Mindrup  et  al.  (2012)  showed  that  the  standard  RPD  model  defined  by 
Myers  et  al.  (2009)  might  not  be  sufficient  for  use  with  more  complex  data,  such  as  HSI, 
and  extended  the  model  to  include  noise  by  noise  (N  x  N)  interactions.  Subsequent 
research  indicates  that  further  extensions  to  the  RPD  framework  might  prove  efficacious 
relative  to  the  HSI  application.  This  research  extends  the  work  by  Mindrup  et  al.  (2012) 
to  include  control  by  noise  by  noise  (C  x  N  x  N)  and  noise  by  control  by  control 
(N  x  C  x  C)  interactions.  Similarly,  these  new  models  are  applied  to  the  AutoGAD 
algorithm  to  locate  improved  settings. 

This  chapter  is  organized  in  the  following  fashion.  In  Section  2,  the  statistical 
framework  of  RPD  is  reviewed,  followed  by  a  summary  of  the  recent  N  x  N 
embellishment  of  Mindrup  et  al.  (2012).  The  section  is  closed  with  the  addition  of 
C  x  N  x  N  and  N  x  C  x  C  interactive  terms  to  the  model.  Section  3  describes  AutoGAD, 
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the  algorithm  used  to  demonstrate  a  real  world  implementation  of  the  RPD  extensions. 
Section  4  presents  the  experiment  conducted  with  AutoGAD.  Finally,  Section  5 
concludes  the  chapter. 

4.2  Robust  Parameter  Design 

RPD  was  formally  introduced  to  the  United  States  in  the  1980s  by  Genichi 
Taguchi  (Taguchi,  1986;  1987)  a  Japanese  quality  consultant.  RPD  is  a  method  for 
selecting  the  best  levels  of  controllable  factors,  or  control  variables,  within  a  system  with 
a  focus  on  the  system  variability.  This  method  assumes  that  the  majority  of  system 
variance  comes  from  uncontrollable  factors,  or  noise  variables,  and  that  these 
uncontrollable  factors  may  be  controlled  in  the  experimental  designs  used  for  the  RPD. 
The  underlying  RPD  model  is  a  function  of  the  control  variables,  x,  and  the  noise 
variables,  z.  The  goal  of  RPD  is  to  select  the  levels  of  control  variables  such  that  the 
system  is  robust  to  the  variance  from  the  noise  variables  (Robinson  et  ah,  2004). 

Lin  and  Tu  (1995)  suggest  a  Mean  Squared  Error  (MSE)  approach  to  determine 
the  optimal  control  settings  by  minimizing  a  function  of  process  mean,  ju  ,  and  variance, 

cr2 .  The  function  to  be  minimized,  within  the  experimental  design  space,  varies 
according  to  whether  a  minimum  mean,  maximum  mean,  or  target  mean  (T)  is  desired,  as 
shown  by  the  following  functions: 

LX  ,=/r+<72,  (25) 

LT(max  mean)  =  “A"  +  ^  ’  (26) 

LT(targetmean,=(A-^)2+^-  (2?) 
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The  general  form  of  the  standard  RPD  model  (Std)  assumes  that  there  are  no  noise 
by  noise  and  quadratic  noise  terms  within  the  model  resulting  in 

T(std)  =  A  +  x' P  +  x'Bx  +  z'y  +  x'Az  +  s  ,  (28) 

which  can  be  rewritten  as 

T(std)  =Po  +x'/3  +  x'Bx  +  (y'+x'A)z  +  £,  (29) 

where  x  is  an  in  x  1  vector  of  control  variables,  z  is  an  n  x  1  random  vector  of  noise 
variables,  f30  is  the  intercept,  /?  is  the  m  x  1  vector  of  control  variable  coefficients,  B  is 
the  m  x  m  matrix  of  quadratic  control  variable  coefficients,  y  is  the  n  *  1  vector  of  noise 
variable  coefficients,  A  is  the  m  x  n  matrix  of  control  by  noise  variable  interaction 
coefficients,  and  the  random  error  associated  with  the  model  is  s  ~  N^O,  cr2) .  The 

random  noise  variables,  z,  are  assumed  such  that  £'(z)  =  0  and  var(z)  =  X  .  The 
expected  value  and  variance  models,  see  the  appendix,  are  then 

E  T(std)  =  +  x' p  +  x'Bx  (30) 

and 

var[T(s,d)]  =  (r’+^’A)2:z(y'+x’A)'+f72,  (31) 

where  a  is  estimated  by  the  MSE  from  the  fitted  model  (Myers  et  al.,  2009). 

Substituting  equation  (30)  for  ju  and  equation  (31)  for  a2  in  any  of  the  LT  functions 
described  in  equations  (25)  -  (27)  reveals  that  the  LT  function  is  only  a  function  of  the 
control  variables,  x.  This  implies  that  the  optimal  control  variables  can  be  determined 
through  constrained  optimization  of  the  LT  function  without  explicit  noise  variable 
consideration  (Koksoy,  2006). 
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Mindrup  et  al.  (2012)  derived  the  mean  and  variance  model  necessary  to 
implement  the  noise  by  noise  RPD  model  herein  denoted  as  N  x  N. 

V(NxN)  =j30  +  x'j3  +  x'Bx  +  (y'+x'A)z  +  z'®z  +  £,  (32) 

E  3)n x n)  =  P0+x' P  +  x'Bx  +  tr{^z),  (33) 

var[ v(N xN)]  =  (/'+  x'A)E.  (y'+  x'A)'+  2tr(OE.)2  +  a2 ,  (34) 

where  O  is  the  n  x  n  matrix  of  quadratic  noise  variable  coefficients  and  tr  represents  the 
trace  of  a  matrix.  The  derivations  of  the  expectation  and  variance  models  are  given  in  the 
appendix. 

Here  the  Mindrup  et  al.  (2012)  model  is  extended  two  steps  forward.  The  first 
model  includes  control  by  noise  by  noise  (C  x  N  x  N)  interactions.  Its  expansion  is  given 
in  equation  (35)  with  mean  and  variance  expressions  given  in  equations  (37)  and  (38). 

The  second  model  adds  noise  by  control  by  control  (N  x  C  x  C)  interactions  to  the 
previous  model.  Its  expression  is  given  in  equation  (39)  with  mean  and  variance 
expressions  given  in  equations  (41)  and  (42).  The  derivations  of  the  expectation  and 
variance  expressions  for  both  models  are  provided  in  the  appendix. 

m 

Here  T't  =  ^vT,x,  where  HT  is  an  n  x  n  matrix  of  control  by  noise  by  noise 

i=i 

tenns  corresponding  tox,,  and  x  =  [x’QjX,  x’Q2x,  ...  ,x’Q(!x] ,  where  Q;  is  an  m  x  m 
matrix  of  noise  by  control  by  control  terms  corresponding  to  Zj.  The  C  x  N  x  N  model  is 
T(cxNxN)  =  A)  +x'j3  +  x'Bx  +  (y'+x'A)z  +  z'<I>z  +  z'x¥xz  +  £,  (35) 

which  can  be  rewritten  as 

T(cxn*n)  =  fio+x'fi  +  x'Bx  +  {r'+x'A)z  +  z'(®  +  xi'x)z  +  £,  (36) 
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£[%  *  N , »)]  =  A.  +  X'P  +  x'Bx  +  fr((®  +  ) ,  (37) 

var[y(c ,  N ,  N)]  =  (y’+-v'A)X.(r'+  V  A)'+  2fr((4>  +  ¥,)E:)’  +  <y2 .  (38) 

The  N  x  C  x  C  model,  which  is  an  extension  to  the  C  x  N  x  N  model,  is 
V(NxC  xc)  =  A  +  x' P  +  x'Bx  +  (y'+x'A)z  +  z'(cp  +  Wx)z  +  xz  +  s ,  (39) 

which  can  be  rewritten  as 

V(NxC  xc)  =  Po  +x'  P  +  x'&x  +  {y  '+x’A  +  x)z  +  z'(ch  +  'F  x)z  +  s,  (40) 

£[T(n  xCxC)]  =  A+^  +  X'BX  +  tr{(®  +  Yx)Sz)  »  (41) 

™[y(KXcxc)]  =  (r'+x'&+x)Mr'+x'A+x)'+M(0  +  w*)^)2  +  (72-  ^ 


4.3  Autonomous  Global  Anomaly  Detector 

AutoGAD  (Johnson  et  al.,  2012)  is  a  fully  autonomous  HSI  anomaly  detection 
algorithm  that  is  comprised  of  four  phases  where  the  only  required  infonnation  from  the 
user  is  the  data  in  the  proper  form  and  the  required  input  parameters,  described  in  Section 
4.4. 1 .  This  suggests  that  AutoGAD  would  be  an  excellent  candidate  for  an  RPD 
experiment  using  the  input  parameters  as  control  variables  and  image  properties 
processed  as  noise  variables.  The  objective  of  the  experiment  being  the  determination  of 
robust  settings  for  the  input  parameters.  The  experiment  described  in  Section  4.4 
parallels  the  work  of  Mindrup  et  al.  (2012). 

4.3.1  Image  Preprocessing 

The  data  from  a  hyperspectral  image  is  stored  in  a  three  dimensional  data  matrix 
referred  to  as  a  data  cube  where  the  vertical  and  horizontal  dimensions  refer  to  the  height 
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and  width  of  the  image  and  the  third  dimension  refers  to  the  number  of  spectral  bands 
collected.  The  data  cube  can  be  thought  of  as  a  collection  different  images  of  the  scene, 
one  at  each  collected  spectral  resolution  (Landgrebe,  2003).  For  AutoGAD  to  detect 
anomalies  within  an  image  it  must  be  converted  into  the  proper  form.  First,  the  data 
needs  to  be  reshaped  from  a  three-dimensional  data  cube  to  a  two-dimensional  data 
matrix,  consisting  of  the  total  number  of  pixel  by  the  total  number  of  spectral  bands.  The 
next  step  is  to  remove  the  atmospheric  absorption  bands,  those  bands  where  the  energy  is 
almost  completely  absorbed  by  the  atmosphere  (Eismann,  2012).  This  reduces  the 
number  of  bands  in  the  images  used  in  this  research  from  210  to  145. 

4.3.2  Feature  Extraction  I  (Phase  I) 

After  the  image  has  been  preprocessed  into  the  proper  form  the  dimensionality  of 
the  data  is  further  reduced  using  Principal  Components  Analysis  (PCA).  PCA  is  a  linear 
transfonnation  that  maps  the  original  data  to  a  new  space  where  all  of  the  columns  of  the 
data  are  uncorrelated.  The  resulting  Principal  Components  (PCs)  are  sorted  such  that  the 
first  PC  contains  the  most  variance  from  the  original  data  and  the  last  PC  contains  the 
least  (Anderson,  2003;  Landgrebe,  2003).  Therefore,  the  original  data  can  be  reduced  by 
selecting  only  the  most  significant  PCs. 

One  of  the  more  popular  methods  to  select  the  significant  PCs  involves  using  the 
percentage  of  total  variance  explained  (Dillon  and  Goldstein,  1984).  Farrell  and 
Mersereau  (2005)  showed  that  this  method  was  extremely  erratic  in  HSI  applications. 
Johnson  et  al.  (2012)  proposed  a  method  they  call  the  Maximum  Distance  Secant  Line 
(MDSL).  This  process  finds  the  “knee”  in  the  eigenvalue  curve  to  determine  how  many 
PCs  to  select.  It  has  been  shown  to  work  well  in  practice  and  is  employed  here. 
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Once  the  significant  PCs  are  selected  using  the  MDSL  method,  the  resulting 
reduced  data  set  is  whitened.  The  new  whitened  reduced  data  set  is  then  processed  in  the 
next  phase  of  the  algorithm. 

4.3.3  Feature  Extraction  II  (Phase  II) 

Independent  Component  Analysis  (ICA)  (Hyvarinen  et  ah,  2001;  Stone,  2004), 
via  the  Fasti CA  algorithm  (Hyvarinen,  1999),  is  performed  on  the  data  with  the  aim  of 
producing  statistically  independent  data.  Each  of  the  Independent  Components  (ICs)  are 
then  reshaped  back  into  an  matrix  the  size  of  the  original  image  referred  to  as  an 
abundance  map  (Johnson  et  ah,  2012). 


4.3.4  Feature  Selection  (Phase  III) 

In  this  phase  of  the  algorithm  the  abundance  maps  are  analyzed  to  determine 
those  which  potentially  contain  anomalies.  This  is  accomplished  through  the  use  of  two 
filters:  the  Potential  Anomaly  Signal  to  Noise  Ratio  (PA  SNR)  and  the  maximum  pixel 
score.  The  PA  SNR  is  defined  as: 


PA  SNR  =  10 -log, 


variance  (potential  anomalies) 
variance  (background) 


(43) 


The  difficulty  in  applying  this  statistic  lies  in  determining  the  threshold  between  the 
background  and  potential  anomalies.  Johnson  et  al.  (2012)  employ  a  method  described 
by  Chiang  et  al.  (2001)  that  suggests  potentially  anomalous  pixels  occur  after  the  first 
zero  bin  in  the  histogram  of  the  data. 

Each  pixel  score  is  standardized  within  an  abundance  map.  This  situation  allows 
for  the  selection  of  a  threshold  to  flag  the  potential  for  anomalies.  The  maximum  pixel 
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score  in  each  abundance  map  is  compared  to  the  threshold.  If  the  threshold  is  exceeded 
then  the  abundance  map  associated  with  that  scene  is  a  map  that  potentially  flags 
anomalies  (Johnson  et  ah,  2012). 

The  abundance  maps  with  corresponding  values  of  PA  SNR  and  maximum  pixel 
score  found  to  be  above  both  user-defined  thresholds  are  analyzed  in  phase  four  for  the 
presence  of  anomalies  (Johnson  et  ah,  2012). 

4.3.5  Identification  (Phase  IV) 

The  first  step  in  the  identification  phase  is  to  smooth  out  background  noise  of  the 
data  using  an  adaptive  Wiener  filter  (Lim,  1990).  Johnson  et  al.  (2012)  implement  this 
process  iteratively  to  create  a  smoother  background/noise  separation  and  call  this  method 
an  Iterative  Adaptive  Noise  (IAN)  filter.  Subsequently,  to  determine  which  pixels  in  the 
selected  abundance  maps  are  anomalous,  the  Chiang  et  al.  (2001)  method  is  implemented 
a  second  time  and  pixels  after  the  first  zero  bin  are  flagged  as  anomalies. 

4.4  RPD  Experiments  with  AutoGAD 

In  this  experiment  the  input  parameters  in  AutoGAD  are  considered  to  be  control 
variables  and  following  the  work  of  Mindrup  et  al.  (2012),  certain  properties  of  the 
images  are  treated  as  noise  variables.  The  main  difference  between  the  experimental 
design  used  in  this  research  and  the  work  of  Mindrup  et  al.  (2012)  concerns  our  use  of  15 
images  for  training  and  5  for  test  as  opposed  to  10  for  training  and  10  for  test  in  the 
previous  work. 


58 


4.4.1  AutoGAD  Input  Parameters 


AutoGAD  has  nine  user-defined  input  parameters: 

•  Dimension  Adjust:  number  of  PCs  above  or  below  the  knee  in  the  eignvalue  curve  to 
retain. 

•  Bin  Width  SNR:  size  of  the  histogram  bins  used  to  determine  a  separation  between 
the  background  and  target  classes  for  PA  SNR. 

•  PA  SNR  Threshold:  SNR  between  the  variance  of  the  potential  anomalies  and  the 
variance  of  the  background  used  to  nominate  an  abundance  maps  as  potentially 
containing  anomalies. 

•  Max  Score  Threshold:  maximum  pixel  score  used  to  nominate  abundance  maps  as 
potentially  containing  anomalies. 

•  Low  PA  SNR  Threshold:  determination  of  whether  a  high  or  low  number  of 
iterations  are  used  in  IAN  filtering. 

•  IAN  Filtering  Iterations  (High  SNR):  number  of  iterations  for  IAN  filtering  when  the 
PA  SNR  is  determined  to  be  high. 

•  IAN  Filtering  Iterations  (Low  SNR):  number  of  iterations  for  IAN  filtering  when  the 
PA  SNR  is  determined  to  be  low. 

•  Window  Size:  size  of  the  window  used  in  IAN  filtering. 

•  Bin  Width  Identify:  size  of  the  histogram  bins  used  to  determine  a  separation 
between  the  background  and  target  classes  for  identifying  pixels  as  anomalies. 


Johnson  et  al.  (2012)  suggests  that  the  values  displayed  in  Table  7  be  used  for 
favorable  perfonnance  of  the  algorithm.  They  observe  that  if  the  settings  for  the  IAN 
filtering  are  altered  away  from  the  suggested  settings,  specifically  window  size,  the 
performance  of  the  algorithm  is  negatively  affected.  Additionally,  the  original  settings 
and  the  improved  settings,  from  the  Johnson  et  al.  (2012)  paper,  both  suggest  a  value  of 
-1  be  used  for  the  dimension  adjust  parameter.  These  observations  led  to  the  decision  to 
fix  the  three  IAN  filter  parameters  and  the  dimension  adjust  parameter  for  the  purpose  of 
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the  RPD  experiment.  The  remaining  five  input  parameters  were  selected  to  be  control 
variables,  as  displayed  in  Table  7,  with  the  corresponding  test  range. 


Table  7:  AutoGAD  Suggested  Parameters  and  Test  Range 


Parameter 

Suggested 

Test  Range 

Dimension  Adjust 

-1 

-1 

Bin  Width  SNR  (x2) 

0.05 

[0.01-0.09] 

PA  SNR  Threshold  (x2) 

2 

[1-5] 

Max  Score  Threshold  (x3) 

10 

[6-14] 

Low  PASNR(x4) 

10 

[6-14] 

IAN  Filtering  Iterations  (High  SNR) 

100 

100 

IAN  Filtering  Iterations  (Low  SNR) 

20 

20 

Window  Size 

3 

3 

Bin  Width  Identify  (x5) 

0.05 

[0.01-0.09] 

4.4.2  Training  and  Test  Images 

The  ten  images  used  in  this  experiment  come  from  the  Hyperspectral  Digital 
Imagery  Collection  Equipment  (HYDICE)  sensor  during  Forrest  Radiance  I  and  Desert 
Radiance  II  collection  events.  To  provide  sufficient  imagery  for  training  and  testing  the 
original  images  were  each  split  into  two  images,  resulting  in  a  total  of  20  images,  ten  top 
halves  and  ten  bottom  halves.  Fifteen  image  halves  were  randomly  selected  for  training 
set  and  the  remaining  five  image  halves  were  used  in  the  test  set. 

To  employ  the  20  images  as  noise  variables  three  noise  characteristics  defined  by 
Mindrup  et  al.  (2010)  were  calculated  for  each  image:  Fisher  ratio,  ratio  of  anomalous  to 
background  pixels,  and  number  of  clusters.  Fisher  ratio,  z, ,  provides  a  good  class 

separablility  measure  (Duda  et  al.,  2001;  Mao,  2002).  Mindrup  et  al.  (2010)  defined  the 
Fisher  ratio  for  a  hyperspectral  image  as  the  average  of  the  Fisher  ratios  of  the  K  bands  in 
the  given  image  resulting  in 
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(44) 
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where  fj,  and  o]:  are  the  mean  and  variance  of  the  anomalous  pixels  of  image  i  and 

band  k,  and  similarly  juh  and  &1  are  the  mean  and  variance  of  the  background  pixels  of 

image  i  and  band  k,  with  respect  to  the  image’s  truth  mask. 

The  ratio  of  anomalous  to  background  pixels,  z0 ,  is  calculated  using  a  truth  mask 


by 


z 


2 1 


(45) 


where  aj  is  the  number  of  anomalous  pixels  in  image  i  and  bt  is  the  number  of 

background  pixels  in  image  i,  with  respect  to  the  image’s  truth  mask  (Mindrup  et  ah, 

2010). 

The  number  of  clusters  in  the  image,  z3 ,  is  the  number  of  homogeneous  groups 

within  the  image  as  calculated  by  the  X-means  algorithm  (Pelleg  and  Moore,  2000). 
Table  8  displays  the  training  and  test  images  along  with  their  calculated  noise  values. 
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T able  8:  Training  and  Test  Images  with  Calculated  Noise  Values 


Image 

Image 

Half 

Fisher 

Ratio  (Zi) 

Percent 

Target  (z2) 

Numberof 

Clusters  (z3) 

ID 

Top 

1.7797 

0.0043 

3 

ID 

Bottom 

1.6265 

0.0028 

3 

IF 

Bottom 

0.3148 

0.0225 

5 

2D 

Top 

0.0957 

0.0247 

4 

2D 

Bottom 

0.1762 

0.0288 

3 

2F 

Top 

0.9633 

0.0084 

7 

Training 

Set 

2F 

Bottom 

0.9311 

0.0085 

7 

3D 

Top 

0.1695 

0.0034 

3 

3F 

Top 

0.2650 

0.0053 

8 

3F 

Bottom 

0.2153 

0.0078 

5 

4D 

Top 

1.4093 

0.0156 

6 

4D 

Bottom 

2.6382 

0.0275 

4 

4F 

Bottom 

0.0779 

0.0063 

8 

5F 

Bottom 

0.7412 

0.0094 

7 

6F 

Top 

0.2658 

0.0109 

6 

IF 

Top 

0.4335 

0.0392 

5 

3D 

Bottom 

1.4299 

0.0033 

3 

Test  Set 

4F 

Top 

0.0826 

0.0046 

7 

5F 

Top 

0.1991 

0.0078 

10 

6F 

Bottom 

1.8451 

0.0052 

4 

4.4.3  AutoGAD  Responses 

With  each  instance  of  the  AutoGAD  algorithm  the  True  Positive  Fraction  (TPF) 
and  Label  Accuracy  (LA)  are  collected.  TPF  is  the  ratio  of  the  number  of  anomalous 
pixels  correctly  classified  to  the  total  number  of  anomalous  pixels  in  the  image.  LA  is 
the  ratio  of  the  number  of  anomalous  pixels  correctly  classified  to  the  total  number  of 
pixels  classified  as  anomalous.  The  response  used  for  the  experimental  design  was 

y  =  TPF  +  LA.  (46) 

This  response  is  used  for  the  experiment  because  it  captures  the  viewpoints  of  both  the 
designer  of  the  algorithm  and  a  potential  user  of  the  algorithm.  Since  both  TPF  and  LA 
have  a  range  of  [0,  1]  the  response  has  the  range  of  [0,  2], 
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4.4.4  Experimental  Design 


To  conduct  the  training  portion  of  the  experiment  a  Face  Centered  Cube  (FCC) 
design  (Montgomery,  2008)  was  selected  for  the  five  control  variables,  to  include  16 
center  runs  and  five  repetitions.  The  resulting  FCC  was  then  crossed  with  the  15  images 
randomly  selected  for  the  training  set  for  a  total  of  4,350  runs.  The  replicates  were  used 
to  account  for  the  variability  of  the  FastICA  algorithm  implemented  within  AutoGAD;  all 
other  aspects  of  AutoGAD  are  detenninistic. 

Before  perfonning  regression  on  the  data  to  create  the  four  RPD  models,  the  five 
control  variables,  x,  -  x5 ,  were  coded  [-1,  1]  and  the  three  noise  variables,  zl-z3,  were 

standardized  individually.  The  standardization  of  the  noise  variables  results  in  the 
correlation  matrix  shown  below. 


2 


z 


1  -0.0320  -0.2330 

-0.0320  1  -0.2460 

-0.2330  -0.2460  1 


4.4.5  Results 

Table  9  displays  pertinent  data  from  the  four  regression  models.  Notice  that  as 

2  2 

the  model  becomes  more  complex,  i.e.  more  terms  are  added;  the  R  and  R  adjusted 
(R'adj)  increase  and  the  MSE  decreases. 


Table  9:  Regression  Data 


Std 

NxN 

CxNxN 

NxCxC 

R2 

0.6269 

0.7637 

0.8031 

0.8291 

R'adj 

0.6253 

0.7624 

0.8010 

0.8264 

MSE 

0.1157 

0.0734 

0.0615 

0.0536 
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The  optimal  settings  were  calculated  using  the  Lin  and  Tu  MSE  method  (Lin  and 
Tu,  1995)  with  the  focus  on  maximizing  the  mean  and  minimizing  the  variance  of 
TPF  +  LA.  Table  10  shows  the  optimal  RPD  settings  for  AutoGAD  for  each  of  the 
defined  models  along  with  the  suggested  setting  from  Johnson  et  al.  (2012)  and  Table  1 1 
shows  the  mean  and  variance  of  the  responses  from  the  five  test  images.  Notice  that  as 
the  model  becomes  more  complex  the  mean  increases.  Likewise,  the  variance  decreases 
as  the  model  becomes  more  complex.  The  goal  of  RPD  is  to  locate  settings  that  provide  a 
favorable  response  while  reducing  process  variance.  This  RPD  example  using  AutoGAD 
shows  a  real  world  implementation  of  a  system  that  would  benefit  from  a  more  complex 
model. 


Table  10:  Optimal  Settings  for  AutoGAD  by  Model 


Parameter 

Johnson 

Std 

N  xN 

C  x  N  x  N 

NxCxC 

Dimension  Adjust 

-1 

-1 

-1 

-1 

-1 

Bin  Width  SNR 

0.05 

0.009 

0.09 

0.01 

0.0198 

PA  SNR  Threshold 

2 

1.0002 

3.8849 

5 

1 

Max  Score  Threshold 

10 

6 

6 

6 

6 

Low  PA  SNR 

10 

7.0936 

14 

14 

14 

IAN  Filtering  Iterations  (High  SNR) 

100 

100 

100 

100 

100 

IAN  Filtering  Iterations  (Low  SNR) 

20 

20 

20 

20 

20 

Window  Size 

3 

3 

3 

3 

3 

Bin  Width  Identify 

0.05 

0.0764 

0.0618 

0.09 

0.09 

Table  11:  Result  from  Optimal  AutoGAD  Settings  by  Model 


Johnson 

Std 

NxN 

CxNxN 

NxCxC 

mean(TPF  +  LA) 

0.9459 

0.9016 

1.1675 

1.2161 

1.2263 

var(TPF  +  LA) 

0.6471 

0.3198 

0.2433 

0.2155 

0.1854 
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4.5  Conclusions 


This  research  provides  the  required  statistical  models  to  extend  the  Mindrup  et  al. 
(2012)  N  x  N  RPD  model  to  include  higher  order  terms,  including  the  C  x  N  x  N  and 
N  x  C  x  C  interaction  terms.  These  higher  order  models  were  applied  to  AutoGAD  to 
locate  better  operating  parameter  settings,  using  properties  of  the  hyperspectral  images  as 
system  noise.  The  expanded  models  provided  better  fits  to  the  data  demonstrated  through 
increased  R  adj  and  decreased  MSE.  Finally,  the  parameter  settings  located  using  these 
new  models  provided  higher  mean  responses  and  lower  response  variance,  the  overall 
goal  of  RPD. 
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5  Conclusion 


This  research  had  three  goals:  address  correlation  problems  inherent  to 
Hyperspectral  Imagery  (HSI)  that  are  often  ignored  by  the  research  community  when 
performing  anomaly  detection,  address  similar  correlation  problems  with  anomaly 
classification,  and  extend  the  Robust  Parameter  Design  (RPD)  model  to  include  three 
factor  interactions.  Chapters  2-4  outlined  each  of  these  goals  individually  and  gave 
results  to  show  the  newly  implemented  methods  achieved  the  stated  goals. 

5,1  Original  Contributions 

Chapter  2  introduced  two  new  anomaly  detectors:  Linear  RX  (LRX),  a  variant  of 
Reed  and  Yu  (1990)  RX  detector,  and  Iterative  Linear  RX  (ILRX),  a  variant  of  the 
Taitano  et  al.  (2010)  Iterative  RX  (IRX)  detector.  LRX  addresses  spatial  correlation 
related  to  RX  by  establishing  a  mean  vector  and  covariance  matrix  using  data  that  is,  on 
average,  further  from  each  other  than  the  standard  RX  window.  The  IRX  detector  allows 
for  the  exclusion  of  outliers  in  the  mean  vector  and  covariance  matrix  calculations, 
thereby  promoting  a  more  accurate  assessment  of  the  target  pixel.  ILRX  then  exploits  the 
innovations  of  both  LRX  and  IRX. 

Chapter  3  continued  addressing  correlation  in  HSI,  but  this  time  with  the  goal 
being  classification.  The  Adaptive  Matched  Filter  (AMF)  with  the  Manolakis  et  al. 

(2009)  suggested  improvements,  referred  to  as  Robust  AMF,  and  was  shown  to  be 
superior  to  the  Standard  AMF.  Additionally,  two  more  AMFs  were  created,  Clustered 
AMF  and  Largest  Cluster  AMF,  and  were  tested  against  the  Standard  AMF  and  Robust 
AMF.  Robust  AMF  addresses  the  problem  of  anomalous  pixels  skewing  the  required 
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statistics.  Clustered  AMF  and  Largest  Cluster  AMF  exploit  the  idea  of  Robust  AMF  and 
address  the  concern  of  non-homogeneity. 

Chapter  4  provided  the  required  statistical  models  to  extend  the  RPD  model  to 
include  higher  order  terms,  including  the  Control  by  Noise  by  Noise  (C  x  N  x  N)  and 
Noise  by  Control  by  Control  (N  x  C  x  C)  interaction  terms.  Applications  of  the  models 
demonstrated  increased  R~adj  and  decreased  Mean  Squared  Error  (MSE),  and  new 
parameter  settings  which  provided  higher  mean  responses  and  lower  response  variance. 

5.2  Suggested  Future  Work 

In  the  process  of  researching  correlation  and  RPD,  some  potential  extensions  to 
this  work  became  evident: 

•  A  comparison  of  the  ILRX  algorithm  with  the  wide  range  of  additional  RX  based 
methods  to  address  RX  shortcomings. 

•  The  new  classification  algorithms  take  into  account  how  anomalies  and  non¬ 
homogeneity  affect  the  mean  vector  and  covariance  matrix  estimates.  Addressing 
spatial  and  spectral  correlation  directly,  rather  than  just  within  the  anomaly 
detectors,  could  further  benefit  anomaly  classification. 

•  Determine  how  much  improvement  can  be  gained  when  employing  the  extended 
RPD  model  to  generate  new  parameters  for  the  standard  anomaly  detection 
algorithms,  such  as:  RX,  IRX,  ILRX,  SVDD,  etc. 
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Appendix 


This  appendix  details  the  derivations  of  the  expectation  and  variance  of  the  four 
separate  models  discussed  in  this  paper:  Standard  Model  (Std),  Noise  by  Noise  Model 
(N  x  N),  Control  by  Noise  by  Noise  Model  (C  x  N  x  N),  and  Noise  by  Control  by  Control 
Model  (N  x  C  x  C).  Within  each  model  x  is  an  m  x  1  vector  of  control  variables,  z  is  an 
n  x  1  random  vector  of  noise  variables  distributed  N(0,  Z_)  ,  J30  is  the  intercept,  ft  is  the 

m  x  1  vector  of  control  variable  coefficients,  B  is  the  m  x  m  matrix  of  quadratic  control 
variable  coefficients,  y  is  the  «xl  vector  of  noise  variable  coefficients,  A  is  the  m  x  n 
matrix  of  control  by  noise  variable  interaction  coefficients,  O  is  the  n  x  n  matrix  of 

m 

quadratic  noise  variable  coefficients,  'F  v  where  Tf  is  an  n  x  n  matrix  of  control 

;=  1 

by  noise  by  noise  coefficients  corresponding  to X/,  x  =  [x’QjX,  x'Q2x,  ...  ,x'Qnx]  where 
Q  .  is  an  m  x  m  matrix  of  noise  by  control  by  control  coefficients  corresponding  to  z/;  and 
the  random  error,  s,  associated  with  the  model  is  assumed  to  be  distributed  N  ( t),  o2  j . 

A.  1  Standard  Model  (Myers  et  ah,  2009) 

T(std)  =  A>  +x'j3  +  x'Bx  +  (y'+x'A)z  +  £ 

Since  the  fi(z)  =  0  and  the  E(s)  =  0 

E(y(stA))  =  fio+x'fi  +  x'Bx 
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Since  fio  and  x  are  constants 

var(  V(Std))  =  var((/'+  x'A)z  +  e) 

=  (/'+x'A)var(z)(f '+x'A)'+cr2 
=  (y'+x'A)Zz(y'+x'A)'+  cr2 

A.2  Noise  by  Noise  Model  (Mindrup  et  ah,  2012) 

J(n,n)  =  Pa  +*'  P  +  x'Bx  +  (y'+x'A)z  +  z'Q>z  +  s 

E[yyixn)]  =  E{Po  +x'/3  +  x'Bx  +  (y'+x'A)z  +  z'®z  +  s) 

=  J30  +  x'  J3  +  x'Bx  +  fs(z'®z) 

Ifx  is  distributed  N(0,  F) ,  then  the  2s[xMx]  =  tr(  A  V  )  where  tr  is  the  trace  of  a  matrix 

(Searle,  1971).  Therefore, 

E  (t(n  x  N)  )  =  Po  +  *' 'P  +  *' 'Bx  + tr  ) 

var^  V(NxN)  j  =  var(/?0  +x'/3  +  x'Bx  +  (y  +  x'A)z  +  z’Oz  +  s} 

=  (y'+  x'A)Ez(/  '+x’A)'+cr2  +  var(z'Oz)  +  2cov((f '+ x'A)z,z'Oz) 

Ifx  is  distributed  N(0,  V ) ,  var[x'dx]  =  2tr(  AV\  where  tr  is  the  trace  of  a  matrix 
(Searle,  1971).  Therefore, 
var(z'chz)  =  2tr(®X_)2 

Theorem  1.  Let  z  =  zTz ,  where  z  is  an  n  x  1  random  vector  such  that  Zf(z)  =  0  and 
var(z  )  =  X  and  T  is  a  constant  matrix  of  size  n  x  n.  If  a  is  a  constant  vector  of  size 
n  x  1,  then  cov(a’z,z)  =  0  . 
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Proof. 


cov(a'z,z)  =  cov(a'z,zTz) 

=  £(a'zzTz)-£(a'z)£(zTz) 
=  fs(«'zzTz) 


=E 


f  n  n  n 

zzz« 

y  /= 1  j= 1  *=1 


*Z*Z,7 


J •  Ji  > 


=£££«* Aw) 

i= 1  y=l  £=1 


The  resulting  summation  consists  of  three  different  types  of  expectation  terms:  /:  [z^]  , 
E^zlzbj,  and  E[zazhzc] .  Anderson  (2003)  showed  that  all  three  types  of  expectation  are 


zero  because  with  multivariate  normal  data  the  third  moment  is  equal  to  zero.  Therefore, 
cov(a’z,z)  =  0  □. 


Letting  a'  =  (y'+  x'A)  the  covariance  tenn  from  var 


is 


cov((f  ’+x'A)z,z'Oz)  =  cov(a'z,z)  =  0,  by  Theorem  1. 

The  resulting  variance  model  is 

var(.v(NxN))  =  (/'+*'A)Z.(f '+x'A)'+2tr((DZ,)2  +cr2 


A.  3  Control  x  Noise  x  Noise  Model 

T(cxNxn)  =  A>  +*'/?  +  *'Bx  +  (f'+  x'A)z  +  z'(®  +  X¥x)z  +  e 

E(y(CxUxN))  =  E(fio+X'fi  +  X'Bx  +  {r'+x'&)z  +  z'(®  +  Xi'x)z  +  £) 

=  j30  +  x'j3  +  x'Bx  +  tr((®  +  '¥x)Zz ) 
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Additionally,  from  the  noise  by  noise  model 

var(%xNXN))  =  var(A+^'^  +  ^'Bx  +  (y'+x,A)z  +  z,(a)  +  Tx)z  +  ^) 

=  (f '+  x'A)Sz(y 1 '+  x'A)'+  2tr((cD  +  xFr)S.)2  +  a2  + ... 
2cov((y'+x'A)z,z’(®  +  'Fjz) 

Letting  a'  =  (y'+  x'A)  and  T  =  (d>  +  VLX),  then  by  Theorem  1, 

cov((y'+x'A)z,z'(®  +  vFr)z)  =  cov(a’z,zTz)  =  cov(a’z,z)  =  0 

Therefore,  the  resulting  variance  model  is, 

'’arh(C,N«N))  =  (r'+^'A)E,()''+^'A)'+2&-((4>  +  'F,)EI)!+CT2 

A.  4  Noise  x  Control  x  Control  Model 

T(nxCxc)  =  Ai  +x’/?  +  x'Bx  +  (y'+x'A  +  x)z  +  z’(0  +  'Tv)z  +  ^ 

E{y(  vxc*c))= E(fo + x'  P + x'Bx +(r'+  x' A+ *)z + z'(°  +  Xi>  *)z + £) 

=  j60  +  x'j8  +  x'Bx  +  tr((®  +  '¥x)?lz) 

var(y(N  x  c  x  c))  =  var  (/?0  +  x'/?  +  x’Bx  +  (y'+  x'A  +  x)z  +  z'(cD  +  Tjz  +  s') 

=  (y'+x'A  +  x)Ez  (y '+  x’A  +  x)'+  2*r((®  +  'Fjr)Ez)2  +  a2  + ... 
2cov((y'+x'A  +  x)z,z'(®  +  vFx)z) 

Letting  a'  =  (y'+  x'A  +  x)  and  T  =  (O  +  xFi.),  them  by  Theorem  1, 

cov((y'+x'A  +  x)z,z’(®  t-Tjz)  =  cov(a’z,zTz)  =  cov(a’z,z)  =  0 

Therefore,  the  resulting  variance  model  is 

Hv(N«cxc))  =  y'+*,A  +  n2I(r'+^,A  +  i)'+2n-((0  +  'P,)EI)2+CT! 
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